"The goal of NLPWin is to enable the machine
to
produce an internal representation that corresponds
to what we understand in our minds when we hear natural language.
This is the key - the understanding of natural language leads to intelligence.
I do not think humans become intelligent just through natural language.
I think as we are born we take in all kinds of sensor input. We have
emotions that are just native to react with, we learn by being immersed
in this environment, language comes along and we put the symbols on
these experiences. Machines are not like that, if they are going to
become intelligent, it is going to have to be some other way. Therefore,
our way is through experience and their way is through symbol manipulation.
We put symbols on our experience and machines are going to have to
learn to put experiences on the symbols," says Karen Jensen,
former manager of the NLP Group.
This bottom-up vision for building intelligent
machines
flies in the face of large-scale top-down AI efforts
such as the 18-year-old CYC project pioneered by AI legend Doug Lenat.
A second major area of difference between NLPWin and CYC is in self-training.
The NLP Group strongly believes that CYC's handcrafting is counterproductive.
Every time CYC encounters a new lexicon, it requires more hand coding
to surgically implant the new knowledge, slowing development and possibly
creating conflicting information. Instead, NLPWin automatically assimilates
the meaning of words from the text.
Assimilating the Meaning
of Words from the Text
This process involves a series of successive
stages,
beginning with a very rudimentary analysis of how words
|
|
connect together to form grammatically
correct sentences. It then explores the deeper structures in the language
hoping to attach meanings to the words and sentences in the context
of the world. As shown in Figure 1, the systems first component breaks
or parses words, arranging them in a tree-like structure.
The next component, Morphology, identifies the various forms of a
word. For example, the root word jump has a variety of variations,
or morphs, such as jumping, jumped, and jumps. By storing just the
root word jump, and retaining the capacity to recognize the other
morphs of the word, the system saves approximately one half the space
it otherwise requires to store all variations of the English words.
The savings is even greater for other languages, such as Spanish,
Arabic and Japanese, where the savings can run as much as three to
four times.
Microsoft Natural Language Dictionary (MIND)
Component
Joseph Pentheroudakis designed the Morphology component and the Microsoft
Natural Language Dictionary (MIND), which was originally built using
two different machine-readable dictionaries, the Longman Dictionary
of Contemporary English and the American Heritage Dictionary. Although
NLPWin uses dictionaries to train itself, "the parser has not
been specifically tuned to process dictionary definitions. All enhancements
to the parser are geared to handle the immense variety of general
text, of which dictionary definitions are simply a modest subset,"
says Pentheroudakis. |