G Component
George Heidorn spearheaded the development of a programming language
called "G" (-short for "Gamma", and also for "Grammar"
or "George" J). G has a lot in common with the AI language
LISP, except that it includes specialized structures for representing
the linguistic relationships. G, along with MIND, enabled the NLP
Group to transform their conceptual dreams of an NLP system into the
reality of a working program, eventually known as NLPWin.
Microsoft English Grammar (MEG) Component
Karen Jensen, a leading authority on English grammar, accomplished
the awesome task of creating a comprehensive set of English grammatical
rules, using the G language. These rules, called the Microsoft English
Grammar (MEG), form the basis of the NLPWin Sketch component. The
Sketch module parses text to produce syntactic structures, which
are passed to the next component in the system. "The beauty
of NLPWin is that any ambiguity is retained and passed up to the
next level for resolution there or beyond," says Jensen.
Portrait Component
Lucy Vanderwende oversaw the construction of NLPWin's next stage,
Portrait, which uses semantic information automatically extracted
from the definitions and example sentences in MIND, to determine
correct phrasal attachment during parsing. In other words, the Sketch
component does not attach prepositional phrases, but the Portrait
component does.
Logical Form Component
Vanderwende also played a significant role in the
development of the Logical Form component. This module |
|
encodes the abstract relations between the concepts
in a sentence. "Many of these relationships can be captured
using a small set of semantic relationships between a head word
and its modifiers," says Vanderwende.
Perhaps the single biggest challenge in developing NLPWin was creating
the method for storing the mapping of the complex and abstract relationships
among words. Although a group effort, Bill Dolan originated the
conceptual framework for a semantic network. It had to be capable
of representing the inter-linking relationships between the logical
forms (grammatical relationships) among words parsed from machine-readable
dictionaries and other sources.
Mindnet
Heidorn and Richardson lead the way in turning this theoretical
structure into a working code base. The auto-construction of semantic
nets was not a new idea in the early 1990s. However, building a
program that self trained from a variety of language sources and
retained the ambiguity in natural language, critical for discovering
the meaning of words, was a radical concept. After years of experimentation,
and a number of breakthroughs, the NLP Group finally developed the
means to auto-construct a semantic net capable of accomplishing
both requirements and called it MindNet.
Figure 2 illustrates the conceptual view of how words interlock
in MindNet. For example, the word bird maps to Hawk through the
is_a relationship. Duck also interlocks with bird by the same is_a
relationship. By sliding along these relationships, NLPWin uses
the knowledge stored in MindNet to identify the meaning of words
in relations to other words.
|