Historians and futurologists alike have been fascinated with the idea of communication without language barriers. Science fiction luminaries have given flight to their thoughts in creating the Babel Fish or the Universal Translator, while history tells us about the legendary Rosetta Stone on which one text was translated into three languages. In an increasingly flattened world, it is more important than ever before to be able to get a message across, irrespective of language.
There have been many approaches adopted over the last few decades to solve this problem. Machine translation (MT) is one that’s been around for decades yet has failed to become ubiquitous even after years of investment in research. Back in the 1960s there was an acronym coined for what researchers wanted to achieve with translation: FAHQT—Fully Automatic High-Quality Translation of General Text. This idealistic approach to MT proved to be unrealistic, and over time a perhaps more accurate acronym was coined: FAUT—Fully Automatic Useful Translation. The goal was not to compete with a human translator but to create a system that was sufficiently accurate to deliver real-time translations that were useful to the average user.
At Microsoft, researchers have been working toward this goal for more than a decade. The approach that the researchers here adopt combines rule-based logic with statistical methods, creating a hybrid statistical syntactic system. For language pairs (original and translated text) where we can make use of substantial linguistic information, we utilize grammar and syntax knowledge in pre- and post-processing around a statistical core engine. Where we do not have as much information, we resort to a purely statistical model that scales well to a large number of language pairs.
The machine translation technology that we developed has already proven itself to be very valuable within Microsoft, having been used since 2003 to translate nearly 140,000 Knowledge Base articles into nine major languages. Many other teams within the company have been using the technology to lower the cost and improve the coverage of their localization efforts. In 2005, the MT team was asked to broaden its scope, and since then we have been focused on expanding the use of this technology outside the company. Our general domain translation Web service has been exposed through search (providing translation functionality to search results), Microsoft Office (providing snippet and document translations), Windows Live Messenger (as a translation bot), and others (see microsofttranslator.com).
Efforts to provide general domain translation services for the Web are gaining momentum. The true value of MT is not just in the quality delivered by the translation engine but in how these translations are delivered within scenarios and the means to address any issues of quality. Unlike search, news, entertainment, or gaming, the perceived value of translation has been limited due to historical inconsistency in translation quality. It is critical for product builders to understand how to maximize the potential of MT.
Unlike today, where most of the translation offerings are portals and translation sites, the greatest value of machine translation will be as a basic and essential ingredient for scenarios that target a linguistically diverse audience. We believe that it is important to provide the means for developers, communities, and content creators to integrate translation into their workflows and use it as a means to harness the power of community.
The MSDN Translation Wiki is a good example of these principles in practice. The community is empowered to help improve the quality of the translation and also to contribute new content—a combination of core technology and the power of community.
Machine translation is an “imperfect” technology and, not unlike evaluating search results, puts the onus on the user to apply her judgment toward the appropriateness of the results delivered. Also, like search, it has the possibility of being surprisingly accurate at times and displays the scope to continually improve with new data. Microsoft is making significant investments in increasing the quality of translations.
I predict that this year will be a significant one for machine translation. In conjunction with the massive power that can be harnessed from the increasingly social Web, machine translation is beginning to deliver on its potential. In the coming months, watch for new and exciting approaches with machine translation that will bridge language gaps across the world.
From: MSDN January 2009 issue
A few test phrases:
The Long Tail