Vikram's Web Archive

Icon

Since interestingness isn't a search option…

The Emergence of Machine Translation

Historians and futurologists alike have been fascinated with the idea of communication without language barriers. Science fiction luminaries have given flight to their thoughts in creating the Babel Fish or the Universal Translator, while history tells us about the legendary Rosetta Stone on which one text was translated into three languages. In an increasingly flattened world, it is more important than ever before to be able to get a message across, irrespective of language.

There have been many approaches adopted over the last few decades to solve this problem. Machine translation (MT) is one that’s been around for decades yet has failed to become ubiquitous even after years of investment in research. Back in the 1960s there was an acronym coined for what researchers wanted to achieve with translation: FAHQT—Fully Automatic High-Quality Translation of General Text. This idealistic approach to MT proved to be unrealistic, and over time a perhaps more accurate acronym was coined: FAUT—Fully Automatic Useful Translation. The goal was not to compete with a human translator but to create a system that was sufficiently accurate to deliver real-time translations that were useful to the average user.

At Microsoft, researchers have been working toward this goal for more than a decade. The approach that the researchers here adopt combines rule-based logic with statistical methods, creating a hybrid statistical syntactic system. For language pairs (original and translated text) where we can make use of substantial linguistic information, we utilize grammar and syntax knowledge in pre- and post-processing around a statistical core engine. Where we do not have as much information, we resort to a purely statistical model that scales well to a large number of language pairs.

The machine translation technology that we developed has already proven itself to be very valuable within Microsoft, having been used since 2003 to translate nearly 140,000 Knowledge Base articles into nine major languages. Many other teams within the company have been using the technology to lower the cost and improve the coverage of their localization efforts. In 2005, the MT team was asked to broaden its scope, and since then we have been focused on expanding the use of this technology outside the company. Our general domain translation Web service has been exposed through search (providing translation functionality to search results), Microsoft Office (providing snippet and document translations), Windows Live Messenger (as a translation bot), and others (see microsofttranslator.com).

Efforts to provide general domain translation services for the Web are gaining momentum. The true value of MT is not just in the quality delivered by the translation engine but in how these translations are delivered within scenarios and the means to address any issues of quality. Unlike search, news, entertainment, or gaming, the perceived value of translation has been limited due to historical inconsistency in translation quality. It is critical for product builders to understand how to maximize the potential of MT.
Unlike today, where most of the translation offerings are portals and translation sites, the greatest value of machine translation will be as a basic and essential ingredient for scenarios that target a linguistically diverse audience. We believe that it is important to provide the means for developers, communities, and content creators to integrate translation into their workflows and use it as a means to harness the power of community.

The MSDN Translation Wiki is a good example of these principles in practice. The community is empowered to help improve the quality of the translation and also to contribute new content—a combination of core technology and the power of community.

Machine translation is an “imperfect” technology and, not unlike evaluating search results, puts the onus on the user to apply her judgment toward the appropriateness of the results delivered. Also, like search, it has the possibility of being surprisingly accurate at times and displays the scope to continually improve with new data. Microsoft is making significant investments in increasing the quality of translations.

I predict that this year will be a significant one for machine translation. In conjunction with the massive power that can be harnessed from the increasingly social Web, machine translation is beginning to deliver on its potential. In the coming months, watch for new and exciting approaches with machine translation that will bridge language gaps across the world.

From: MSDN January 2009 issue

A few test phrases:
microcredit
Lifeboat Economics
The Long Tail
Ubiquitous

Photo Stream

AngelsLandingHDIMG_1435IMG_1503IMG_1142c2IMG_1132IMG_1133

Translate

 

Library

Speaker for the Dead
To Kill a Mockingbird
Leviathan Wakes
Caliban's War
Hyperion
Revelation Space
Ender's Game
Dune
On Basilisk Station
Old Man's War
Shards of Honour
The Warrior's Apprentice
The Fall of Hyperion
The Rise of Endymion
Endymion

 

Now Reading

Planned books:

Current books:

  • The Wisdom of Crowds

    The Wisdom of Crowds by James Surowiecki

Recent books:

None

View full Library