@device(postscript) @libraryfile(Mathematics10) @libraryfile(Accents) @style(fontfamily=timesroman,fontscale=11) @pagefooting(immediate, left "@c", center "@c", right "@c") @heading(Expectation-Based Selective Attention) @heading(CMU-CS-96-182) @center(@b(Shumeet Baluja)) @center(October 1996 - Ph.D. Thesis) @center(FTP: Unavailable) @blankspace(1) @begin(text,spacing=1) In many real-world tasks, the ability to focus attention on the relevant portions of the input is crucial for good performance. This work has shown that, for temporally coherent inputs, a computed expectation of the next time step's inputs provides a basis upon which to focus attention. Expectations are useful in tasks which arise in visual and non-visual domains, ranging from scene analysis to anomaly detection. When temporally related inputs are available, an expectation of the next input's contents can be computed based upon the current inputs. A saliency map, which is based upon the computed expectation and the actual inputs, indicates which inputs will be important for performing the task in the next time step. For example, in many visual object tracking problems, the relevant features are predictable, while the distractions in the scene are either unpredictable or unrelated to the task. The task-specific selective attention methods can be used to create a saliency map which accentuates only the predictable inputs that are useful in solving the task. In a second use of expectation, anomaly detection, the unexpected features are important. Here, the role of expectation is reversed; it is used to emphasize the unpredicted features. The performance of these methods is demonstrated in artificial neural network based systems on two real-world vision tasks: lane-marker tracking for autonomous vehicle control and driver monitoring, and hand tracking in cluttered scenes. For the hand-tracking task, techniques for incorporating @i(a priori) available domain knowledge are presented. These methods are also demonstrated in a non-vision based task: anomaly detection in the plasma etch step of semiconductor wafer fabrication. In addition to explicitly creating a saliency map to indicate where a network should pay attention, techniques are developed to reveal a network's @i(implicit) saliency map. The implicit saliency map represents the portions of the input to which a network will pay attention in the absence of the explicit focusing mechanisms developed in this thesis. Methods to examine the features a network has encoded in its hidden layers are also presented. These techniques are applied to networks trained to perform face-detection in arbitrary visual scenes. The results clearly display the facial features the network determines to be the most important for face detection. These techniques address one of the largest criticisms of artificial neural networks @y(M) that it is difficult to understand what they encode. @blankspace(2line) @begin(transparent,size=10) @b(Keywords:@ )@c @end(transparent) @blankspace(1line) @end(text) @flushright(@b[(209 pages)])