met Xuedong (XD) Huang
(http:// research.
microsoft.com /srg/xdh), a graduate student studying at the
University of Edinburgh in Scotland. In 1986, he traveled to CMU to
work as a visiting scientist, and hit it off with Lee, becoming life
long friends.
In 1989, Lees CMU team
won the DARPA sponsored evalution, with a system named Sphinx (http://fife.
speech.cs.-cmu.edu/sphinx). This was the worlds first
speaker-independent continuous speech recognition system capable of
recognizing a 1,000-word resource management vocabulary. Huang
also contributed to Sphinx, working six months with the CMU team to
prepare the system for operation. Upon graduation from Edinburgh,
Lee recruited Huang to CMU. In 1990, Lee left academia to work for
Apple, Inc. Their paths did not cross professionally again until the
late 1990s.
Huang took over Lees position
and led the effort to develop Sphinx II. His CMU speech team consisted
of Mei-Yuh Hwang, Fil Alleva, and Roni Rosenfield. Sphinx II was to
compete in the next DARPA evaluation, with the first 60,000-word vocabulary
system. NIST gives the data to every site. Our results were
so good, so much better than the second best, some thought we were
fudging it, says Huang. In 1992, Sphinx II not only had the
most accurate results but also the largest error reduction in the
history of the DARPA funded speech program.
My team then received the
Alan Newell Medal. Jim Morris, the head of the department at CMU,
handed out the medal at a Wean Hall party. We were the first ones
to receive the Alan Newell award. recalls Huang.
Naturally, this type of success
attracted attention. Microsoft, which had just founded its own research
lab, soon began to intensely recruit Huang and his speech team. Huang
said no for six months, realizing how many people at CMU
he would disappoint by leaving. The final deciding factor was
the opportunity to position speech technology into the worlds
most popular PC operating system, says Huang.
Championing Speech Technology
at Microsoft
In January of 1993, Huang came
to Microsoft, along with two members of his CMU speech team (Fil
Alleva and Mei-Yuh Hwang), to work in the mobile
|
|
business unit of the Operating System (OS) product
group. Nathan Myhrvold, head of Microsoft Research (MSR), had initially
tried to convince Huang to come into MSR, but Huang had come to productizing
speech technology not continue with basic research.
Nonetheless, within a month of
arriving, Bill Gates transferred Huangs speech team out of product
development into Microsoft Research. Gates felt that the PC hardware
was not yet sufficiently powerful to support speech recognition, and
he sent a memo to the head of the OS group, Paul Martiz, transferring
Huangs speech team to MSR. I was a little disappointed
at the time, recalls Huang.
In due course, Huang came to
see the wisdom of the relocation to MSR. He now acknowledges that
his teams speech algorithms needed further improvement before
product deployment. I thought the speech technology was ready
because I was naïve about the level of reliability and robustness
required for productizing software, says Huang.
Once inside Microsoft Research,
Huangs team built a prototype continuous speech engine, codenamed
Whisper, based on the work done earlier at CMU with Sphinx II. To
access this new speech engine, the Speech Technology Group (STG) also
developed a Speech Application Programming Interface (SAPI). Huang
credits a brilliant Microsoft software development engineer, Milind
Mahajan, with enabling the STG to make rapid progress with Whisper
and SAPI. Within six months, we fully staffed the team, ported
the code to Windows NT, and compressed it to a 1MB footprint. By July
of 93, we were ready for a demonstration, says Huang.
The first demonstration of Whisper
was a comparison with software from Dragon Systems. Ironically, both
programs stemmed from DARPA funded speech research at CMU. Karen Hargrove,
then program manager of the Digital Office Group, oversaw the system
comparisons. Whisper outperformed Dragons offering. Our
DARPA evaluation experience played a key role in the win, says
Huang.
In 1994, the STG staff had grown
to 20 and worked with a small group of product developers to deliver
SAPI 1.0 and Whisper for inclusion in the Windows 95 SDK (Software
Development Kit). Mike Rozak spearheaded the SAPI 1.0 effort on the
product side, while Huang oversaw refinement of the speech engine
(Whisper) in research. The actual shipped code for the speech recognizer,
written |