CMU-HCII-14-105 Human-Computer Interaction Institute School of Computer Science, Carnegie Mellon University
Enabling Non-Speech Experts to Develop Usable Speech-User Interfaces Anuj Kumar August 2014 Ph.D. Thesis
To address the above problem, we take the view that while it can take prohibitive amount of time and cost to train non-experts into the nuances of speech recognition and user-interface development, well- trained speech experts and user-interface specialists who routinely buildworking recognizers have accumulated years of experiential knowledge that we can study and formalize for the benefit of non-experts. As such, the core speech recognition technology has reached a point where given enough expertise and in-domain data, a working system can be developed for almost every user group, acoustic or language situation. To this end, we design, develop, and evaluate a speech toolkit called SToNE, which embeds expert knowledge and lowers the entry bar for non-experts into the design and development space of speech systems. Our goal is not to render the speech expert superfluous, but to make it easier for non-speech experts to figure out why a speech system is failing, and guide their efforts in the right direction. We investigate three research goals: (i) how can we elicit and formalize the tacit knowledge that speech experts employ in building an accurate recognizer, (ii) what are the different analysis supports – automatic or semi- automatic – that we can develop to enable speech recognizer development by non-experts, and (iii) to what extent do non-experts benefit from SToNE. Through experiments both in the lab with new datasets, and summative evaluations with non-experts, we show that with the support of SToNE, non-experts are able to build recognizers with accuracy similar to that of experts, as well as achieve significant gains from when SToNE support is unavailable to them. This work aims to support the "black art" in SUI development. It contributes to human-computer interaction by developing tools that support non-speech experts in building usable SUIs. It also contributes to speech technologies by formalizing experts knowledge and offering a set of tools to analyze speech data systematically
127 pages
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |