CMU-ISRI-06-111
Institute for Software Research
School of Computer Science, Carnegie Mellon University



CMU-ISRI-06-111

Application of a Probability-Based Algorithm to
Extraction of Product Features from Online Reviews

Christopher Scaffidi

June 2006

CMU-ISRI-06-111.pdf


Keywords: Information extraction, mining, personalization, product reviews


Prior research has demonstrated the viability of automatically extracting product features from online reviews. This paper presents a probability-based algorithm and compares it to an existing support-based approach. Specifically, I used each algorithm to extract features from 7 Amazon.com product categories and then asked end users to rate the features in terms of helpfulness for choosing products. The end users preferred the features identified by the probability-based algorithm. This probability-based algorithm can identify features that comprise a single noun or two successive nouns (which end users rated as more helpful than features comprising only one noun), yet even for collections of tens of thousands of reviews, it still executes fast enough (at around 1ms per review) for practical use.

15 pages


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu