Introduction to Translation Standards
From Aspirationtech.org Wiki
Main discussion topics:
- Main translation standards introduced: TMX, XLIFF, TBX
Ideas for further discussion this week
- Standards gaps: quality control, translation management workflow
- Need for open source input into standards like XLIFF
Discussion Notes
- building tools that are standards-based to make them more usable*
- Why do we need standards in tools?
- How can people reuse translation memory in their work (TM is a database of aligned translation pairs – there are standards for the size of units that are 1:1 matched – phrases, sentence, etc. good for a variety of purposes, like statistical machine translation)
- TMX is an XML file, and the standard format for translation memories (source, target text, languages they are in)
- Standards for timed-aligned translation?
- XLIFF – take segmented documents (say Word, or pdf), open in XLIFF supporting tool and translate, save it and then convert to original document format. Supports human (rather than machine) translation tools
- TBX – online simple dictionary format, stores terminology
- TBX Basic - simpler form of TBX
- TMI – text markup initiative – standards for lexicography, syntactic markup, probably
- Problems: translations locked into translation tools
- SRX – standard for specifying segmentation. Works with XLIFF to help you segment the source document. See OmegaT tool.
- Standards for syntactic and semantic markup?
- Need more open source input into XLIFF
- ISO 639-3, Unicode, UTF 8 (briefly mentioned)
- GMV trying to quantify the complexity of words (standard for counting words)
- No standards that we know of around quality control
