Institute for Software Research
School of Computer Science, Carnegie Mellon University
Topic Modeling in Large Scale Social Network Data
Aman Ahuja*, Wei Wei, Kathleen M. Carley
Center for the Computational Analysis of Social and Organizational Systems
The growing popularity of social media such as Twitter and Facebook has made these websites an important source of information. The large amount of data available on these platforms presents new opportunities for mining information about the real world.
Because of its widespread usage, a lot of useful information can be extracted from the text available on these social media platforms. It can be used to infer important aspects about the users of these services and about the things happening in their surroundings.
This work proposes generative probabalistic models to identify latent topics and sentiments in social media data, mainly Twitter. In contrast to the majority of earlier work done in the field of topic modeling in social media data, this work incorporates various special characteristics of this data - mainly the short-length nature and special tokens like hashtags. The models proposed in work were compared qualitatively and quantitatively against several baseline models for evaluation. Experimental results suggest several improvements over the existing baseline techniques.