Home Site
 

    Page 25
16.4 Table of Contents Bottom of Page Site Map
paraphrase identification. This involves sliding around on the lexical similarity dimension to locate a match (e.g., "canine" against "dog"). Syntactic paraphrasing may also come into play (e.g., matching "Jupiter has 18 moons" to "Jupiter's 18 moons"). Often both are required ("How many moons does Jupiter have?" vs. "Jupiter's 18 satellites").
"MindMelding relies on MindNet's path-finding and
lexical similarity routines. Briefly, paths between the least frequent word in the input graph and other words directly connect to it are identified. Along these paths, typically, are words that are found to be similar to one of the endpoints (e.g. looking for paths between 'car' and 'top' might provide paths linked through 'vehicle' or 'hood'). These newly-identified words, which aren't simply similar in meaning to the original words but, crucially, similar in this particular lexical context, can now be used for matching if no structures with the original words can be found. This process is iterated, so that a number of contextually-similar words can be identified," says Dolan
Although the MindMelding algorithm relies on the type of
graph matching that is intractable (impossible) for the worst cases, the wide variety of context and linguistic heuristics that the MT system brings to bear on the matching problem prevents worst case scenarios from occurring. Nonetheless, carrying out the match efficiently is still a highly complex challenge.
"We take the Logical Form and try to find pieces that
match in the stored database mapping [of MindNet] and follow those to the corresponding link on the English side.
Grab all those pieces and sort of Frankenstein monster-like put them together into a Linked Logical Form. Right now we are working on using language modeling techniques to smooth out any differences that make that stitched together Logical Form look non-native…using statistical techniques, we smooth out any wrinkles that don't look like what an English Logical Form should look like," says Dolan.
Once MindMeld has worked its magic, the
corresponding pieces of target LFs are stitched together to form an English target LF, which is handed off to the Generation module. "Provided we've done a good job of assembling a LF, the NLPWin's generation component reliably maps that LF into a well-formed target-language sentence," says Dolan. In the example shown in Figure 4, the English string "Click the highlighted sample text" is generated from the original Spanish input "Haga clic en el texto de muestra resaltado."
At runtime, NLP Group's MT system translates all the
English text in the Microsoft Product Support Services Knowledge Base (KB) into Spanish, allowing users to search the converted KB using Spanish queries. "As articles are added or updated in English (which happens a couple thousand times a week), they will be immediately (re-) translated and posted to the Spanish KB. Occasionally, as the MT system improves, the entire KB will be retranslated using newer versions of the MT system. This will happen incrementally, so users should not experience any down time," notes Richardson. To date, internal Microsoft studies indicate a high level of satisfaction with the results obtained using the translated Spanish PSS Knowledge Base.
Figure 4: An example of how an english string “Click the highlighted sample text” is generated from the original Spanish input “Haga
clic en el texto de muestra resaltado.”

To Page 24

16.4 Table of Contents
Top of Page

To Page 26


16.4 2002
25

PC AI Magazine - PO Box 30130 Phoenix, AZ 85046 - Voice: 602.971.1869 Fax: 602.971.2321
e-mail: info@pcai.com - Comments? webmaster@pcai.com