Techno Translation

Faced with a severe shortage of linguists, federal agencies are turning to computers for help.

Imagine a world where language is not a barrier. Coast Guard officials could board foreign vessels entering U.S. waters and easily establish the crews' intent; U.S. troops in Iraq would detain far fewer (and far more valuable) people suspected of aiding the insurgency there; FBI case officers immediately could be alerted to the plans of suspected terrorists.

Such a world still is well in the future, but advances in computational linguistics-a field of study that melds computer science and language-are giving federal officials reason to be optimistic.

From his office in Northern Virginia, Joseph Olive, a program manager at the Defense Advanced Research Projects Agency, demonstrates a computer program by BBN Technologies in Cambridge, Mass., that translates live broadcasts from the Middle East and China into English. The BBN Broadcast Monitoring System creates a continuous searchable archive of international TV broadcasts, transcribes the speech into text and then uses a tool called Language Weaver to translate the Arabic and Chinese text into English. There is about a five-minute lag between the broadcast and the English translation, which appears as text on his computer screen. The translation is not perfect-odd syntax garbles the meaning at times-but it is certainly impressive.

"Ten years ago, nobody would have imagined this," says Olive. In a nutshell, computer programs model language use, based on detailed analysis of the sounds of a given language. It is a probabilistic process-certain sounds are more likely to form certain combinations in words; certain words are more likely to follow others in sentences.

"Once you have the text in the foreign language, now you again begin a probabilistic process because every word in the foreign language may be translated into a whole bunch of English words. The question is, which is the right one," Olive says.

"There is no perfect Star Trek translator," says Larry Goodell, the Defense executive overseeing the advanced concepts and technology demonstration program evaluating language and speech exploitation resources, known as the LASER ACTD. The program, which involves the FBI, Homeland Security Department and other agencies, is designed to evaluate dozens of technologies and quickly move those with potential into the field.

According to Goodell, one of the most promising technologies to come out of the LASER ACTD thus far is the document exploitation suite, or DOCEX Suite. The Army has been using DOCEX Suite in Iraq, along with a tool called Harmony DOCEX, to scan written material collected in the field, electronically translate it where possible and then create a searchable library of those documents.

The translation quality ranges from useless to very useful, depending on the quality of the text (handwritten documents remain nearly impossible to scan and translate). But the tools have been immeasurably helpful in conducting triage-determining which documents need to be translated immediately by a skilled linguist and which can wait-and then managing the workflow for linguists who might be working from various locations around the world.

"Even if you can only translate 5 percent of the words, you can still get value out of that," says David Place, deputy director of the technology applications directorate at the Army's Intelligence and Security Command, headquartered at Fort Belvoir, Va. "That's 5 percent you didn't have before. This is an incremental process."