CMU-CS-01-108
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-01-108

Generalized Feature Extraction for Structural Pattern
Recognition in Time-Series Data

Robert T. Olszewski

May 2001

Ph.D. Thesis

CMU-CS-01-108.ps
CMU-CS-01-108.pdf


Keywords: Structural pattern recognition, classification, feature extraction, time series, Fourier transformation, wavelets, seminconductor fabrication, electrocardiography


Pattern recognition encompasses two fundamental tasks: description and classification. Given an object to analyze, a pattern recognition system first generates a description of it (i.e., the pattern) and then classifies the object based on that description (i.e., the recognition). Two general approaches for implementing pattern recognition systems, statistical and structural, employ different techniques for description and classification. Statistical approaches to pattern recognition use decision-theoretic concepts to discriminate among objects belonging to different groups based upon their quantitative features. Structural approaches to pattern recognition use syntactic grammars to discriminate among objects belonging to different groups based upon the arrangement of their morphological (i.e., shape-based or structural) features. Hybrid approaches to pattern recognition combine aspects of both statistical and structural pattern recognition.

Structural pattern recognition systems are difficult to apply to new domains because implementation of both the description and classification tasks requires domain knowledge. Knowledge acquisition techniques necessary to obtain domain knowledge from experts are tedious and often fail to produce a complete and accurate knowledge base. Consequently, applications of structural pattern recognition have been primarily restricted to domains in which the set of useful morphological features has been established in the literature (e.g., speech recognition and character recognition) and the syntactic grammars can be composed by hand (e.g., electrocardiogram diagnosis). To overcome this limitation, a domain-independent approach to structural pattern recognition is needed that is capable of extracting morphological features and performing classification without relying on domain knowledge. A hybrid system that employs a statistical classification technique to perform discrimination based on structural features is a natural solution. While a statistical classifier is inherently domain independent, the domain knowledge necessary to support the description task can be eliminated with a set of generally-useful morphological features. Such a set of morphological features is suggested as the foundation for the development of a suite of structure detectors to perform generalized feature extraction for structural pattern recognition in time-series data.

The ability of the suite of structure detectors to generate features useful for structural pattern recognition is evaluated by comparing the classification accuracies achieved when using the structure detectors versus commonly-used statistical feature extractors. Two real-world databases with markedly different characteristics and established ground truth serve as sources of data for the evaluation. The classification accuracies achieved using the features extracted by the structure detectors were consistently as good as or better than the classification accuracies achieved when using the features generated by the statistical feature extractors, thus demonstrating that the suite of structure detectors effectively performs generalized feature extraction for structural pattern recognition in time-series data.

111 pages


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu