CMU-CS-85-136

Computer Science Department
School of Computer Science, Carnegie Mellon University


CMU-CS-85-136

Learning to Recognize Speech Sounds: A Theory and Model

CMU-CS-85-136

Gary L. Bradshaw

June 1985 - Thesis

Theories of human speech perception have emphasized the role of innate feature detectors in speech comprehension. Empirical evidence suggests that theories based on specilized feature detectors are wrong, and that human listeners improve in their ability to identify the basic sounds of their language. A learning theory of speech perception is proposed to account for the evidence. To test the theory, a computer simulation, NEXUS, was created. When provided with a simple vocabulary of the names of the letters of the alphabet, NEXUS was able to create descriptions of all words, identify the similarities between words, and simplify the network by eliminating redundant information. The resulting word network was used to classify new instances of speech. Performance of NEXUS was superior to that of a state-of-the-art speech recognition system, Cicada, on both speakers tested. NEXUS serves as a sufficiency proof of the learning theory, although the lack of detailed learning data precludes stronger comparisons with human performance. NEXUS also demonstrates that learning heuristics can be very useful in building computer systems to perform perceptual tasks, such as speech recognition or vision. These heuristics do not require statistical assumptions about the form of the distribution underlying the data.

90 pages


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu