Computer Science Department
School of Computer Science, Carnegie Mellon University
A Framework and Toolkit for the Construction of Multimodal Learning Interfaces
Minh Tue Vo
This dissertation contributes in three main areas: theory of multimodal interaction, software architecture and reusable application framework, and rapid application prototyping by domain-specific instantiation of a common underlying architecture.
The foundation of the application framework and the rapid prototyping tools is a model of multimodal interpretation based on semantic integration of information streams. This model supports most of the conceivable human communication modalities in the context of a broad class of applications, specifically those that support state manipulation via parameterized actions. The multimodal semantic model is also the basis for a flexible, domain-independent, incrementally trainable multimodal interpretation algorithm based on a connectionist network.
The second major contribution is an application framework consisting of reusable components and a modular, distributed system architecture. Multimodal application developers can rapidly construct a new application using the components in the framework, accepting default options when appropriate and providing application-specific customizations when needed.
The third major contribution is a design process backed by a workbench of tools to permit the rapid prototyping of a multimodal application. This design process systematically constructs customizations needed to interpret multimodal inputs in a given domain, allowing an application structure created in the proposed framework to be instantiated for that domain.
The application framework and design process have been successfully applied to the construction of three multimodal systems in three different domains.