CMU-CS-99-102
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-99-102

Comparing Methods for Multivariate Nonparametric Regression

David L. Banks, Robert T. Olszewski, Roy A. Maxion

January 1999

CMU-CS-99-102.ps
CMU-CS-99-102.pdf


Keywords: Multivariate nonparametric regression, linear regression, stepwise linear regression, additive models, AM, projection pursuit regression, PPR, recursive partitioning regression, RPR, multivariate adaptive regression splines, MARS, alternating conditional expectations, ACE, additivity and variance stablization, AVAS, locally weighted regression, LOESS, neural networks


The ever-growing number of high-dimensional, superlarge databases requires effective analysis techniques to mine interesting information from the data. Development of new-wave methodologies for high-dimensional nonparametric regression has exploded over the last decade in an effort to meet these analysis demands. This paper reports on an extensive simulation experiment that compares the performance of ten different, commonly-used regression techniques: linear regression, stepwise linear regression, additive models (AM), projection pursuit regression (PPR), recursive partitioning regression (RPR), multivariate adaptive regression splines (MARS), alternating conditional expectations (ACE), additivity and variance stabilization (AVAS), locally weighted regression (LOESS), and neural networks. Each regression technique was used to analyze multiple datasets each having a unique embedded structure; the accuracy of each technique was determined by its ability to correctly identify the embedded structure averaged over all the datasets. Datasets used in the experiment were constructed to have a range of characteristics by varying the dimension of the data, the true dimension of the embedded structure, the sample size, the amount of noise, and the complexity of the embedded structure. Analyses of the results show that all of these properties affect the accuracy of each regression technique under investigation. A mapping from data characteristics to the most effective regression technique(s) is suggested.

56 pages


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu