Latest Advances in Text Alignment image

Latest Advances in Text Alignment

October 11th 2017 12:30 - 12:45
Room C

Text Alignment has to date been very much the unwanted and unloved topic in localization. Classical approaches to corpora alignment have adopted the standard Gale and Church approach from 1993, which pre-supposes equal segments between source and target languages. In the real world this is very rarely the case, for example often the source and target documents are not the same version. In addition in an unconstrained translation environment the translator can make arbitrary decisions such as omitting sentences, rendering multiple sentences as one sentence, or vice versa, or to completely 'rework' the document.

Existing alignment approaches cannot cope with this typical scenario and the only solution is for the translator doing the alignment to match segments manually: a very tedious and unrewarding task.

Fortunately Big Data has come to the rescue: It has recently been possible to harvest massive scale lexicons from publicly available resources on the Internet covering a very large number of languages. Using multilingual large scale lexicons it is now possible to significantly increase the accuracy for aligning segments and thus fully automate even the most difficult alignment projects relieving translators from the tedious task of manual alignment.

Andrzej Zydroń avatar
Andrzej Zydroń

CTO at XTM International: .  

Andrzej Zydroń is one of the leading IT experts on Localization and related Open Standards. Zydroń sits/has sat on, the following Open Standard Technical Committees: LISA OSCAR GMX, LISA OSCAR xml:tm, LISA OSCAR TBX, W3C ITS, OASIS XLIFF, OASIS T...


This website uses cookies. By continuing to browse you agree to this and Conferize's terms of service.