Machine Learning Department
School of Computer Science, Carnegie Mellon University
Rare Category Analysis
This thesis focuses on rare category analysis, where the majority classes have a smooth distribution, and the minority classes exhibit a compactness property. Furthermore, we focus on the challenging cases where the support regions of the majority and minority classes overlap each other. To the best of our knowledge, this thesis is the first end-to-end investigation of rare categories.
Depending on the availability of the label information, we can perform either supervised or unsupervised rare category analysis. In the supervised settings, our first task is rare category detection, which is to discover at least one example from each minority class with the help of a labeling oracle. Then given labeled examples from all the classes, our second task is rare category characterization. The goal here is to find a compact representation for the minority classes in order to identify all the rare examples with high precision and recall. On the other hand, in the unsupervised settings, we do not have access to a labeling oracle. Here we propose to co-select candidate examples from the minority classes and the relevant features, which benefits both tasks (rare category selection and feature selection). For each of the above tasks, we have developed effective algorithms with theoretical guarantees as well as good empirical results.
In the future, we plan to apply rare category analysis on rich data, such as medical images, texts / blogs, Electronic Health Records (EHR), web link graphs, stream data, etc; we plan to build statistical models for the rare categories in order to understand how they emerge and evolve over time; we plan to study complex fraud based on rare category analysis; we plan to make use of transfer learning to help with our analysis; we also plan to build a complete system for rare category analysis.
||SCS Technical Report Collection
School of Computer Science homepage
This page maintained by firstname.lastname@example.org