Computer Science Department
School of Computer Science, Carnegie Mellon University


A Framework and Toolkit for the Construction of Multimodal Learning Interfaces

Minh Tue Vo

April 1998

Ph.D. Thesis

Multimodal human-computer interaction, in which the computer accepts input from multiple channels or modalities, is more flexible, natural, and powerful than unimodal interaction with input from a single modality. Many research studies have reported that the combination of human communication means such as speech, gestures, handwriting, eye movement, etc., enjoys strong preference among users. Unfortunately, the development of multimodal applications is difficult and still suffers from a lack of generality, such that a lot of duplicated effort is wasted when implementing different applications sharing some common aspects. The research presented in this dissertation aims to provide a partial solution to the difficult problem of developing multimodal applications by creating a modular, distributed, and customizable infrastructure to facilitate the construction of such applications.

This dissertation contributes in three main areas: theory of multimodal interaction, software architecture and reusable application framework, and rapid application prototyping by domain-specific instantiation of a common underlying architecture.

The foundation of the application framework and the rapid prototyping tools is a model of multimodal interpretation based on semantic integration of information streams. This model supports most of the conceivable human communication modalities in the context of a broad class of applications, specifically those that support state manipulation via parameterized actions. The multimodal semantic model is also the basis for a flexible, domain-independent, incrementally trainable multimodal interpretation algorithm based on a connectionist network.

The second major contribution is an application framework consisting of reusable components and a modular, distributed system architecture. Multimodal application developers can rapidly construct a new application using the components in the framework, accepting default options when appropriate and providing application-specific customizations when needed.

The third major contribution is a design process backed by a workbench of tools to permit the rapid prototyping of a multimodal application. This design process systematically constructs customizations needed to interpret multimodal inputs in a given domain, allowing an application structure created in the proposed framework to be instantiated for that domain.

The application framework and design process have been successfully applied to the construction of three multimodal systems in three different domains.

204 pages

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by