CMU-ML-12-106
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-12-106

Attribute Learning using
Joint Human and Machine Computation

Edith L. M. Law

August 2012

Ph.D. Thesis

CMU-ML-12-106.pdf


Keywords: Attribute Learning, Human Computation, Games with a Purpose, Machine Learning


This thesis is centered around the problem of attribute learning – using the joint effort of humans and machines to describe objects, e.g., determining that a piece of music is "soothing," that the bird in an image "has a red beak", or that Ernest Hemingway is an "Nobel Prize winning author." In this thesis, we present new methods for solving the attribute-learning problem using the joint effort of the crowd and machines via human computation games.

When creating a human computation system, typically two design objectives need to be simultaneously satisfied. The first objective is human-centric – the task prescribed by the system must be intuitive, appealing and easy to accomplish for human workers. The second objective is task-centric – the system must actually perform the task at hand. These two goals are often at odds with each other, especially in the casual game setting. This thesis shows that human computation games can accomplish both the human-centric and task-centric objectives, if we first design for humans, then devise machine learning algorithms to work around the limitations of human workers and complement their abilities in order to jointly accomplish the task of learning attributes. We demonstrate the effctiveness of our approach in three concrete problem settings: music tagging, bird image classification and noun phrase categorization.

Contributions of this thesis include a framework for attribute learning, two new game mechanisms, experiments showing the effectiveness of the hybrid human and machine computation approach for learning attributes in vocabulary-rich settings and under the constraints of knowledge limitations, as well as deployed games played by tens of thousands of people, generating large datasets for machine learning.

145 pages


SCS Technical Report Collection
School of Computer Science