Translation Tools: New Approaches to an Old Discipline

Automated translation tools have been around for a long time, and new techniques are boosting their performance. But use them with caution.

Sometimes things get lost in translation.

For example, type the question, Automated language translation is it an idea whose time has come? into Googles English-French translator, then enter the result of that into its French-German translator, and finally ask Google to translate the German back to English, and you end up with this: Automated language translation is it an idea, from which the time came? Not too bad.

Now do the same with this sentence: Reboot your computer and try again. Youll wind up with this: Their computer and attempt still again load. Perhaps not good enough for your multi­lingual user manual.

Language translation software isnt likely to allow you to lay off your bilingual staffers at least not right away. But applied with discrimination and lots of preparation, translation tools can be fantastic productivity aids. And researchers say new approaches to this old discipline are greatly improving the performance of the tools.

Ford Motor Co. began using machine translation software in 1998 and has so far translated 5 million automobile assembly instructions into Spanish, German, Portuguese and Mexican Spanish. Assembly manuals are updated in English every day, and their translations some 5,000 pages a day are beamed overnight to plants around the world.

It wouldnt be feasible to do this all manually, says Nestor Rychtyckyj, a technical specialist in artificial intelligence (AI) at Ford.

Nestor Rychtyckyj

Nestor Rychtyckyj The car maker uses Enterprise Global Server from Systran Software Inc. in San Diego, but licensing the software was just the first step in automating Fords translation activities. High-level English instructions, such as, Install the muffler, are written by engineers and then parsed by a homegrown AI program into unambiguous detailed directions, such as, Attach bracket No. 423 using six half-inch bolts. Each instruction is then stored as a record in a translation database.

Ford also had to develop dictionaries of terms and phrases that are unique to automobile assembly and to Ford. Most of the effort we spend on this system is building glossaries, and they change frequently, Rychtyckyj says. But your translation results are a lot better if you put in a lot of work upfront.

Still, he says, it may be easier to maintain a glossary than to find a translator who speaks English and Portuguese and understands automobile technology and terms.

Systrans tool uses a tried-and-true translation technique called rules-based translation. Such systems use bilingual dictionaries combined with electronic style guides containing usage and grammar rules. (For example, in English, the verb usually follows the subject, but in German, it often comes at the end of the sentence.) These commercial translators are typically supplemented with application-specific glossaries like those used at Ford.

They are often also combined with translation memories, databases of previously translated text in the form of source and target sentence pairs. These memories are usually compiled over time by users. If the translation system (or a human) finds an exact match for the sentence its trying to translate, it just retrieves the corresponding sentence in the target language from the database. It can also do this for near, or fuzzy, matches, flagging them for review by a human translator.

Training the Software

Statistical machine translation is a newer technique thats not yet in widespread use. It uses collections of documents and their translations to train software. Over time, these data-driven systems learn what makes a good translation and what doesnt and then use probability and statistics to decide which of several possible translations of a given word or phrase is most likely correct based on context.

Statistical systems require large volumes of documents for training the algorithms, but they dont require grammatical rules, bilingual dictionaries or translation memories. The systems, in effect, develop their own rules and continue to fine-tune them over time.

Google Inc. uses Systrans rules-based software but is also developing its own statistical-based systems to translate to and from Arabic, Chinese and Russian. Those languages are especially tough for machine translators because their structures are so different from Western Romance languages, says Franz Josef Och, a research scientist at Google.

Och says Google will keep its advanced translation technologies secret, but corporate Web sites may include a link to Googles translation tools at www.google.com/language_tools for free.

For some years, Microsoft Corp. has incorporated a rules-based natural-language parser in its Word software. More recently, it has used a combination of translation memories, rules-based and statistical-based machine translations, and humans to translate documents for its customer support knowledge base.

The new direction in the research community is to see how you can combine these purely statistical techniques with some linguistic knowledge, says Steve Richardson, a senior researcher at Microsoft. Its modeling the rules with the statistical methods.

The biggest user of Microsofts translation software may well be Microsoft itself, which has an annual translation budget in the hundreds of millions of dollars. At one time, only 5% to 10% of its customer support documents were translated from English, because there was simply too much material, Richardson says. Now, that same percentage is translated by humans and the rest is done by computers.

Good Enough

Automated translation in the corporate world succeeds to the extent that users are willing to carefully customize systems to their unique needs and vocabularies, he says. And the technology is most appropriate when translations dont have to be perfect. We have serviced thousands and thousands of customers with articles we have machine-translated, Richardson says. Its not perfect, but its good enough. They get an answer without calling in. Whats that worth to the company?

Asked if translation breakthroughs are on the horizon, he says, The breakthroughs from a research perspective have already happened. The breakthrough on the practical side will come in creating systems that are integrated into the workflows of [user] companies.

That is precisely what FedEx Corp. is doing. Late in 2005, after an 18-month evaluation of various products and services, the Memphis-based delivery company began rolling out Trados GXT, a product of Maidenhead, England-based SDL International. It consists of translation memories integrated with an enterprise translation workflow system.

The plan is that eventually any user anywhere in the company will be able to upload documents for translation, and that an integrated system will manage the entire process by which customer-facing information is translated and published.

FedEx is also expanding the system to enable the translation of documents going to overseas employees such as salespeople. Its an infrastructure component, says Tracci Schultz, an IT manager at FedEx. It has databases, workflow, GUIs all the things needed to integrate into our content management systems and into our [application] code repositories.

But Schultz is careful to point out that the system does not do actual machine translations. It can do much of the translation task by finding matching sentences in the translation memories, but whatever cant be found there is not passed through a rules-based or statistical-based system; its sent to an outside provider of human-based translation services.

Theres sensitivity to the context and how we communicate with the customer, Schultz explains. We are very conscientious about having people who understand our brand and our tone, and they reflect that in their translations.

To help it manage its translation outsourcing, FedEx went from 40 translation vendors to two during the introduction of its enterprise translation system, Schultz says, adding that the company will likely use those vendors services less and less as its systems translation memories grow. She says Fed­Ex hopes to get to the point where 80% of its translation workload is translated via memories and 20% by humans.

Meanwhile, translation systems are becoming more sophisticated by combining multiple methods. A statistical machine translation product from Language Weaver Inc. in Marina del Rey, Calif., can now be used with translation management software called WorldServer from Idiom Technologies Inc. Customers can tap into WorldServer to retrieve previously translated content in a translation memory or generate new translations through Language Weavers algorithms when no matches are found.

The two methods complement each other, says Dave Rosenlund, a vice president at Waltham, Mass.-based Idiom. Customers can find the maximum amount of translation reuse in translation memory, then complete any sentences that have not been previously translated, he explains, noting that the resulting document can then be passed to a human translator for review.

Hybrids on the Horizon

Such hybrid systems, which combine translation memories and machine translation based on rules or statistics or both, are the wave of the future, researchers say, and they are becoming more sophisticated and complex.

At SRI International in Menlo Park, Calif., for example, researchers are working with the U.S. Department of Defense to automate the translation of Arabic and Mandarin Chinese structured and unstructured text as well as real-time speech into English.

In essence, SRIs approach is to do machine translations with the best available rules-based and statistical-based systems, and then have another system that adjudicates among them in real time to find the best translation.

Jordan Cohen, a senior scientist at SRI, says, We get a system combination answer by combining the results of five systems. It uses a process that takes into account the particular order of the output for each sentence in each system and the probability that that particular system produces good answers.

Users should not be surprised when garbage translations come from garbage input, regardless of system sophistication. No matter how smart these systems ultimately become, details will still count, says Fords Rychtyckyj. You can improve the translation quality a lot by improving the construction of the source text, he says. Put articles in front of nouns, use the correct punctuation, and use proper English grammar.

Also, he advises, you need to manage user expectations. Tell them they are not going to get perfect translations in all cases. Our users love to find examples of translations that come out with silly results.

Perhaps Rychtyckyj could suggest to his users, Their computer and attempt still again load.

How One Automated Translation System Works

Automated Translation

In Language Weavers automated translation software, translated material to train the system comes in various formats (left). Once translated, data is collected, and parallel documents in different languages are identified and aligned, sentence by sentence, to create a parallel corpus. The learner processes this corpus and extracts statistical probabilities, patterns and rules to create the translation parameters (used to find the most accurate translation) and the language model (used to find the most fluent translation). Both are used to create a new language pair for translations between two languages.

Related:

Copyright © 2007 IDG Communications, Inc.

  
Shop Tech Products at Amazon