|
CMU-ISRI-06-111
Institute for Software Research
School of Computer Science, Carnegie Mellon University
CMU-ISRI-06-111
Application of a Probability-Based Algorithm to
Extraction of Product Features from Online Reviews
Christopher Scaffidi
June 2006
CMU-ISRI-06-111.pdf
Keywords: Information extraction, mining, personalization, product
reviews
Prior research has demonstrated the viability of automatically extracting
product features from online reviews. This
paper presents a probability-based algorithm and compares it to an
existing support-based approach. Specifically, I used each algorithm to
extract features from 7 Amazon.com product categories and then asked
end users to rate the features in terms of helpfulness for choosing
products. The end users preferred the features identified by the
probability-based algorithm. This probability-based algorithm can
identify features that comprise a single noun or two successive
nouns (which end users rated as more helpful than features comprising
only one noun), yet even for collections of tens of thousands of reviews,
it still executes fast enough (at around 1ms per review) for practical use.
15 pages
|