PC AI 16.4 Sample Version Page24

In this example, the Spanish source sentence "Haga clic en la ficha" and the matching English Target sentence "Click the tab" are parsed, using the Morphology, Sketch, Portrait, and Logical Form components, into their respective source and target Logical Forms (LFs). Both LFs undergo statistical processing to identify word associations (e.g., "ficha" and "tab") and "alignment" of their structures. "This is done for 350,000 sentence pairs in English and Spanish, applying both heuristics (rules) and statistics to find bits of structural alignment across the language boundary. Most of the MT work has gone into the alignment phase, figuring out which bits across that language boundary should align up and what context you need to save," says Dolan. Rules help the system learn the appropriate context, narrowing the search. A probability is attached to each correspondence, or "mapping", for use during runtime. Finally, these transfer mappings are stored in the MindNet repository. "We thought it would take us a lot longer to make progress on machine translation. It has come together pretty fast," says Dolan. To speed development, the research arm of Microsoft took a page from the product side by creating nightly NLPWin builds to make available feedback on progress. Helping speed this process along, the researchers "…use a huge cluster of 30 computers to retrain the system and rerun a regression test every night," says Richardson. This produces a new NLPWin build each day, as well as a newly updated version of MindNet.		Consequently, the NLP Group sees the impact of the previous day's work on the MT system's effectiveness, making it simpler to recognize positive and negative changes to the code, and fixing the latter. Another longtime obstacle to progress in natural language processing is the lack of an objective means to accurately measure advancements. The typical metric, having humans judge the accuracy of a machine translation, makes the process inherently subjective. The NLP Group developed a more objective testing metric, which compares how close the MT system comes to matching an ideal sentence translation. By minimizes the role human judgment plays in determining MT improvements, this approach is a more quantifiable process, as has been the case in speech recognition for decades. Runtime Figure 4 illustrates how the MT system works during runtime. In this example, a Spanish source sentence is parsed by NLPWin into its source LF. The next stage, MindMeld, refers to a highly sophisticated process that has consumed the NLP Group's research efforts since 1997. "MindMelding takes a sentence and matches it to the closet conceptual relationship in a MindNet," says Dolan. This is essentially a graph matching process, which takes an input sentence LF and attempts to match it against one or more subgraphs in MindNet. For instance, if the Spanish source LF is uncomplicated, it might exactly match an English target LF in MindNet. Typically, the match requires
*www.megaputer.com*


To Page 23	16.4 Table of Contents	Top of Page	To Page 25