|
CMU-CS-99-102
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-99-102
Comparing Methods for Multivariate Nonparametric Regression
David L. Banks, Robert T. Olszewski, Roy A. Maxion
January 1999
CMU-CS-99-102.ps
CMU-CS-99-102.pdf
Keywords: Multivariate nonparametric regression, linear
regression, stepwise linear regression, additive models, AM, projection
pursuit regression, PPR, recursive partitioning regression, RPR, multivariate
adaptive regression splines, MARS, alternating conditional expectations,
ACE, additivity and variance stablization, AVAS, locally weighted regression,
LOESS, neural networks
The ever-growing number of high-dimensional, superlarge databases
requires effective analysis techniques to mine interesting information
from the data. Development of new-wave methodologies for
high-dimensional nonparametric regression has exploded over the last
decade in an effort to meet these analysis demands. This paper reports
on an extensive simulation experiment that compares the performance of
ten different, commonly-used regression techniques: linear regression,
stepwise linear regression, additive models (AM), projection pursuit
regression (PPR), recursive partitioning regression (RPR), multivariate
adaptive regression splines (MARS), alternating conditional
expectations (ACE), additivity and variance stabilization (AVAS),
locally weighted regression (LOESS), and neural networks. Each
regression technique was used to analyze multiple datasets each having
a unique embedded structure; the accuracy of each technique was
determined by its ability to correctly identify the embedded structure
averaged over all the datasets. Datasets used in the experiment were
constructed to have a range of characteristics by varying the dimension
of the data, the true dimension of the embedded structure, the sample
size, the amount of noise, and the complexity of the embedded
structure. Analyses of the results show that all of these properties
affect the accuracy of each regression technique under investigation.
A mapping from data characteristics to the most effective regression
technique(s) is suggested.
56 pages
|