"The goal of NLPWin is to enable the machine
to
produce an internal representation that corresponds
to what we understand in our minds when we hear natural language.
This is the key - the understanding of natural language leads to intelligence.
I do not think humans become intelligent just through natural language.
I think as we are born we take in all kinds of sensor input. We have
emotions that are just native to react with, we learn by being immersed
in this environment, language comes along and we put the symbols on
these experiences. Machines are not like that, if they are going to
become intelligent, it is going to have to be some other way. Therefore,
our way is through experience and their way is through symbol manipulation.
We put symbols on our experience and machines are going to have to
learn to put experiences on the symbols," says Karen Jensen,
former manager of the NLP Group.
This bottom-up vision for building intelligent
machines
flies in the face of large-scale top-down AI efforts
such as the 18-year-old CYC project pioneered by AI legend Doug Lenat.
A second major area of difference between NLPWin and CYC is in self-training.
The NLP Group strongly believes that CYC's handcrafting is counterproductive.
Every time CYC encounters a new lexicon, it requires more hand coding
to surgically implant the new knowledge, slowing development and possibly
creating conflicting information. Instead, NLPWin automatically assimilates
the meaning of words from the text.
Assimilating the Meaning
of Words from the Text
This process involves a series of successive stages,
beginning with a very rudimentary analysis of how words
|
|
connect together to form grammatically
correct sentences. It then explores the deeper structures in the language
hoping to attach meanings to the words and sentences in the context
of the world. As shown in Figure 1, the systems first component breaks
or parses words, arranging them in a tree-like structure.
The next component, Morphology, identifies the
various forms of a word. For example, the root word
jump has a variety of variations, or morphs, such as jumping, jumped,
and jumps. By storing just the root word jump, and retaining the capacity
to recognize the other morphs of the word, the system saves approximately
one half the space it otherwise requires to store all variations of
the English words. The savings is even greater for other languages,
such as Spanish, Arabic and Japanese, where the savings can run as
much as three to four times.
Microsoft Natural Language Dictionary (MIND)
Component
Joseph Pentheroudakis designed the Morphology
component and the Microsoft Natural Language Dictionary
(MIND), which was originally built using two different machine-readable
dictionaries, the Longman Dictionary of Contemporary English and the
American Heritage Dictionary. Although NLPWin uses dictionaries to
train itself, "the parser has not been specifically tuned to
process dictionary definitions. All enhancements to the parser are
geared to handle the immense variety of general text, of which dictionary
definitions are simply a modest subset," says Pentheroudakis. |