G Component
George Heidorn spearheaded the development of a
programming language called "G" (-short for
"Gamma", and also for "Grammar" or "George"
J). G has a lot in common with the AI language LISP, except that it
includes specialized structures for representing the linguistic relationships.
G, along with MIND, enabled the NLP Group to transform their conceptual
dreams of an NLP system into the reality of a working program, eventually
known as NLPWin.
Microsoft English Grammar (MEG) Component
Karen Jensen, a leading authority on English grammar,
accomplished the awesome task of creating a comprehensive
set of English grammatical rules, using the G language. These rules,
called the Microsoft English Grammar (MEG), form the basis of the
NLPWin Sketch component. The Sketch module parses text to produce
syntactic structures, which are passed to the next component in the
system. "The beauty of NLPWin is that any ambiguity is retained
and passed up to the next level for resolution there or beyond,"
says Jensen.
Portrait Component
Lucy Vanderwende oversaw the construction of
NLPWin's next stage, Portrait, which uses semantic
information automatically extracted from the definitions and example
sentences in MIND, to determine correct phrasal attachment during
parsing. In other words, the Sketch component does not attach prepositional
phrases, but the Portrait component does.
Logical Form Component
Vanderwende also played a significant role in the
development of the Logical Form component. This module |
|
encodes the abstract relations between the concepts
in a sentence. "Many of these relationships can be captured using
a small set of semantic relationships between a head word and its
modifiers," says Vanderwende.
Perhaps the single biggest challenge in developing
NLPWin was creating the method for storing the mapping
of the complex and abstract relationships among words. Although a
group effort, Bill Dolan originated the conceptual framework for a
semantic network. It had to be capable of representing the inter-linking
relationships between the logical forms (grammatical relationships)
among words parsed from machine-readable dictionaries and other sources.
Mindnet
Heidorn and Richardson lead the way in turning
this
theoretical structure into a working code base. The
auto-construction of semantic nets was not a new idea in the early
1990s. However, building a program that self trained from a variety
of language sources and retained the ambiguity in natural language,
critical for discovering the meaning of words, was a radical concept.
After years of experimentation, and a number of breakthroughs, the
NLP Group finally developed the means to auto-construct a semantic
net capable of accomplishing both requirements and called it MindNet.
Figure 2 illustrates the conceptual view of how
words
interlock in MindNet. For example, the word bird maps
to Hawk through the is_a relationship. Duck also interlocks with bird
by the same is_a relationship. By sliding along these relationships,
NLPWin uses the knowledge stored in MindNet to identify the meaning
of words in relations to other words.
|