Machine Learning Department
School of Computer Science, Carnegie Mellon University


Benjamin Shih

December 2011


Keywords: NA

Researchers have discovered many successful algorithms and methodologies for solving problems at the intersection of machine learning and education research. This umbrella category, "educational data mining," has enjoyed a series of successes that span the research process, from post-hoc data analysis that generates models to the use of those models in successful educational interventions. However, most of these successes have arisen from the use of pre-existing psychological and educational constructs (e.g., guessing) and thus from the use of semi-supervised or fully-supervised machine learning algorithms. Algorithms for novel discovery, also known as unsupervised clustering, have enjoyed significantly fewer successes in this domain, partially because education data exhibit unique, complex structure.

This thesis is a mixture of algorithm development, simulation, and experimentation on real-world data, all designed to define and test a novel paradigm for clustering in education (and a range of other domains). This paradigm, target clustering, revolves around the inclusion of high-level targets, such as student learning from pre-test to post-test. This approach differs from other existing machine learning approaches in that it is designed completely, from the initial concept to the final execution, for solving educational research problems, taking advantage of the structural complexities that are problematic for other algorithms. This thesis includes a range of data sets drawn from a variety of research domains, but does not include new data from experiments in the psychological sense. However, the thesis includes analysis of methodology, results, and implications from an educational research perspective and relies entirely on education data and research problems.

210 pages

SCS Technical Report Collection
School of Computer Science homepage

This page maintained by