Here are a few of the 76 pages available to our subscribers

Home Site
 

    Page 20
16.6 Table of Contents Bottom of Page Site Map
met Xuedong (XD) Huang (http:// research. microsoft.com /srg/xdh), a graduate student studying at the University of Edinburgh in Scotland. In 1986, he traveled to CMU to work as a visiting scientist, and hit it off with Lee, becoming life long friends.
       In 1989, Lee’s CMU team won the DARPA sponsored evalution, with a system named Sphinx (http://fife. speech.cs.-cmu.edu/sphinx). This was the world’s first speaker-independent continuous speech recognition system capable of recognizing a 1,000-word resource management vocabulary. Huang also contributed to Sphinx, working six months with the CMU team to prepare the system for operation. Upon graduation from Edinburgh, Lee recruited Huang to CMU. In 1990, Lee left academia to work for Apple, Inc. Their paths did not cross professionally again until the late 1990s.
       Huang took over Lee’s position and led the effort to develop Sphinx II. His CMU speech team consisted of Mei-Yuh Hwang, Fil Alleva, and Roni Rosenfield. Sphinx II was to compete in the next DARPA evaluation, with the first 60,000-word vocabulary system. “NIST gives the data to every site. Our results were so good, so much better than the second best, some thought we were fudging it,” says Huang. In 1992, Sphinx II not only had the most accurate results but also the largest error reduction in the history of the DARPA funded speech program.
       “My team then received the Alan Newell Medal. Jim Morris, the head of the department at CMU, handed out the medal at a Wean Hall party. We were the first ones to receive the Alan Newell award.” recalls Huang.
       Naturally, this type of success attracted attention. Microsoft, which had just founded its own research lab, soon began to intensely recruit Huang and his speech team. Huang said “no” for six months, realizing how many people at CMU he would disappoint by leaving. “The final deciding factor was the opportunity to position speech technology into the world’s most popular PC operating system,” says Huang.

Championing Speech Technology at Microsoft
       In January of 1993, Huang came to Microsoft, along with two members of his CMU speech team (Fil Alleva and Mei-Yuh Hwang), to work in the mobile

business unit of the Operating System (OS) product group. Nathan Myhrvold, head of Microsoft Research (MSR), had initially tried to convince Huang to come into MSR, but Huang had come to productizing speech technology — not continue with basic research.
       Nonetheless, within a month of arriving, Bill Gates transferred Huang’s speech team out of product development into Microsoft Research. Gates felt that the PC hardware was not yet sufficiently powerful to support speech recognition, and he sent a memo to the head of the OS group, Paul Martiz, transferring Huang’s speech team to MSR. “I was a little disappointed at the time,” recalls Huang.
       In due course, Huang came to see the wisdom of the relocation to MSR. He now acknowledges that his team’s speech algorithms needed further improvement before product deployment. “I thought the speech technology was ready because I was naïve about the level of reliability and robustness required for productizing software,” says Huang.
       Once inside Microsoft Research, Huang’s team built a prototype continuous speech engine, codenamed Whisper, based on the work done earlier at CMU with Sphinx II. To access this new speech engine, the Speech Technology Group (STG) also developed a Speech Application Programming Interface (SAPI). Huang credits a brilliant Microsoft software development engineer, Milind Mahajan, with enabling the STG to make rapid progress with Whisper and SAPI. “Within six months, we fully staffed the team, ported the code to Windows NT, and compressed it to a 1MB footprint. By July of 93, we were ready for a demonstration,” says Huang.
       The first demonstration of Whisper was a comparison with software from Dragon Systems. Ironically, both programs stemmed from DARPA funded speech research at CMU. Karen Hargrove, then program manager of the Digital Office Group, oversaw the system comparisons. Whisper outperformed Dragon’s offering. “Our DARPA evaluation experience played a key role in the win,” says Huang.
       In 1994, the STG staff had grown to 20 and worked with a small group of product developers to deliver SAPI 1.0 and Whisper for inclusion in the Windows 95 SDK (Software Development Kit). Mike Rozak spearheaded the SAPI 1.0 effort on the product side, while Huang oversaw refinement of the speech engine (Whisper) in research. The actual shipped code for the speech recognizer, written

To Page 19

16.6 Table of Contents
Top of Page

To Page 28


16.6
20

PC AI Magazine - PO Box 30130 Phoenix, AZ 85046 - Voice: 602.971.1869 Fax: 602.971.2321
e-mail: info@pcai.com - Comments? webmaster@pcai.com