CMU-ISR-15-108
Institute for Software Research
School of Computer Science, Carnegie Mellon University



CMU-ISR-15-108

Topic Modeling in Large Scale Social Network Data

Aman Ahuja*, Wei Wei, Kathleen M. Carley

December 2015

Center for the Computational Analysis of Social and Organizational Systems
CASOS Technical Report

CMU-ISR-15-108.pdf


Keywords: Topic Modeling, Social Network Analysis, Probabalistic Graphical Models

The growing popularity of social media such as Twitter and Facebook has made these websites an important source of information. The large amount of data available on these platforms presents new opportunities for mining information about the real world.

Because of its widespread usage, a lot of useful information can be extracted from the text available on these social media platforms. It can be used to infer important aspects about the users of these services and about the things happening in their surroundings.

This work proposes generative probabalistic models to identify latent topics and sentiments in social media data, mainly Twitter. In contrast to the majority of earlier work done in the field of topic modeling in social media data, this work incorporates various special characteristics of this data - mainly the short-length nature and special tokens like hashtags. The models proposed in work were compared qualitatively and quantitatively against several baseline models for evaluation. Experimental results suggest several improvements over the existing baseline techniques.

35 pages

*Visiting Undergraduate Student, BITS Pilani - K.K. Birla Goa Campus, India


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu