CMU-ISRI-04-127
Institute for Software Research International
School of Computer Science, Carnegie Mellon University



CMU-ISRI-04-127

Sentiment Extraction from Unstructured Text using
Tabu Search-Enhanced Markov Blanket

Xue Bai, Rema Padman, Edoardo Airoldi

July 2004

CMU-ISRI-04-127.ps
CMU-ISRI-04-127.pdf


Keywords: Bayesian Models, Opinion, Sentiments, Sematic Orientation, Semantic Learning, Information Retrieval, Text Analysis, Text Classification, Bayesian Network, Markov Blanket, Tabu Search, Local Dependencies.


Extracting sentiments from unstructured text has emerged as an important problem in many disciplines. An accurate method would enable us, for example, to mine on-line opinions from the Internet and learn customers' preferences for economic or marketing research, or for leveraging a strategic advantage.

In this paper, we propose a two-stage Bayesian algorithm that is able to capture the dependencies among words, and, at the same time, finds a vocabulary that is efficient for the purpose of extracting sentiments. Experimental results on the Movie Reviews data set show that our algorithm is able to select a parsimonious feature set with substantially fewer predictor variables than in the full data set and leads to better predictions about sentiment orientations than several state-of-the-art machine learning methods.

Our findings suggest that sentiments are captured by conditional dependence relations among words, rather than by keywords or high-frequency words.

13 pages


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu