Home Site
 

    Page 24
16.4 Table of Contents Bottom of Page Site Map
In this example, the Spanish source sentence "Haga clic en la ficha" and the matching English Target sentence "Click the tab" are parsed, using the Morphology, Sketch, Portrait, and Logical Form components, into their respective source and target Logical Forms (LFs). Both LFs undergo statistical processing to identify word associations (e.g., "ficha" and "tab") and "alignment" of their structures.
"This is done for 350,000 sentence pairs in English and
Spanish, applying both heuristics (rules) and statistics to find bits of structural alignment across the language boundary. Most of the MT work has gone into the alignment phase, figuring out which bits across that language boundary should align up and what context you need to save," says Dolan.
Rules help the system learn the appropriate context,
narrowing the search. A probability is attached to each correspondence, or "mapping", for use during runtime. Finally, these transfer mappings are stored in the MindNet repository.
"We thought it would take us a lot longer to make
progress on machine translation. It has come together pretty fast," says Dolan. To speed development, the research arm of Microsoft took a page from the product side by creating nightly NLPWin builds to make available feedback on progress.
Helping speed this process along, the researchers
"…use a huge cluster of 30 computers to retrain the system and rerun a regression test every night," says Richardson. This produces a new NLPWin build each day, as well as a newly updated version of MindNet.
Consequently, the NLP Group sees the impact of the
previous day's work on the MT system's effectiveness, making it simpler to recognize positive and negative changes to the code, and fixing the latter.
Another longtime obstacle to progress in natural language
processing is the lack of an objective means to accurately measure advancements. The typical metric, having humans judge the accuracy of a machine translation, makes the process inherently subjective. The NLP Group developed a more objective testing metric, which compares how close the MT system comes to matching an ideal sentence translation. By minimizes the role human judgment plays in determining MT improvements, this approach is a more quantifiable process, as has been the case in speech recognition for decades.

Runtime

Figure 4 illustrates how the MT system works during
runtime. In this example, a Spanish source sentence is parsed by NLPWin into its source LF. The next stage, MindMeld, refers to a highly sophisticated process that has consumed the NLP Group's research efforts since 1997. "MindMelding takes a sentence and matches it to the closet conceptual relationship in a MindNet," says Dolan.
This is essentially a graph matching process, which takes
an input sentence LF and attempts to match it against one or more subgraphs in MindNet. For instance, if the Spanish source LF is uncomplicated, it might exactly match an English target LF in MindNet. Typically, the match requires

To Page 23

16.4 Table of Contents
Top of Page

To Page 25


16.4 2002
24

PC AI Magazine - PO Box 30130 Phoenix, AZ 85046 - Voice: 602.971.1869 Fax: 602.971.2321
e-mail: info@pcai.com - Comments? webmaster@pcai.com