CMU-ISR-20-104 Institute for Software Research School of Computer Science, Carnegie Mellon University
Social Media Analytics for Stance Mining Sumeet Kumar May 2020
Ph.D. Thesis
People express their opinions on blogs and other social media platforms. As per a recent estimate, interactions on Twitter alone result in over 500 million tweets per day. The magnitude of this data enables new applications of opinion mining that have previously remained challenging, e.g., finding users' stance (as in pro or con) on topics of interest. However, one of the major barriers to utilizing this amount of data is the cost of hand-labeling examples for machine learning. This barrier is even more apparent in stance mining, as opinions can change over time and can be about any issues. To reduce the need for hand-labeled data by taking the complex interactions of social media users and their social influence into account, this dissertation develops semi-supervised methods forstance mining. Most existing studies on stance mining take a simplistic view that assumes a sentence (like a Tweet) holds a perspective that is independent of the context and the author's network position. This approach to stance learning leaves three crucial unresolved challenges. First, how do we train stance-learning models on new topics with minimal labeling effort? Discussion topics change fast and new issues emerge, making it difficult to reuse prior labeled data. However, artifacts of social networks like hashtags can give noisy signal about the stance of users. To extract the signal from noise, I develop methods to find useful hashtags by exploiting how users in the pro-group and the anti-group use popular hashtags. Second, how can we use multiple interaction modalities for stance mining? Users opinions are evident in different types of interactions, e.g. tweeting, retweeting or liking. I develop a semi-supervised method based on co-training that jointly trains multiple stance classifiers using different interaction modalities resulting in a better stance prediction model. Third, how to leverage users networks for stance prediction? The current approaches to stance learning ignore important network factors such as the interactions of social media users (e.g., a persons preference can also be known from his friends preferences). I use the network alignment as one of the training signals to train the stance classifiers. My thesis brings a new direction to the stance learning problem that is grounded in social theory, is more amenable to analyzing activities on social media, and allows effective learning from multiple types of interactions without requiring large amounts of labeled data. By labeling only a few hashtags used in Twitter conversations on a few controversial topics, my approach allows for predicting both the stance of users (as in whether they are pro or con a topic) by over 80% accuracy and the stance in conversations (as in whether they favor or deny others posts) by over 70% accuracy.
162 pages
James D. Herbsleb, Director, Institute for Software Research
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |