INSTITUTE FOR SOFTWARE RESEARCH TECHNICAL REPORT ABSTRACTS

CMU-ISR-20-104
Institute for Software Research
School of Computer Science, Carnegie Mellon University

CMU-ISR-20-104

Social Media Analytics for Stance Mining
A Multi-Modal Approach with Weak Supervision

Sumeet Kumar

May 2020

Ph.D. Thesis
Societal Computing

CMU-ISR-20-104.pdf

Keywords: Social Media, Social Networks, Opinion Mining, Semi-Supervised, Stance

People express their opinions on blogs and other social media platforms. As per a recent estimate, interactions on Twitter alone result in over 500 million tweets per day. The magnitude of this data enables new applications of opinion mining that have previously remained challenging, e.g., finding users' stance (as in pro or con) on topics of interest. However, one of the major barriers to utilizing this amount of data is the cost of hand-labeling examples for machine learning. This barrier is even more apparent in stance mining, as opinions can change over time and can be about any issues. To reduce the need for hand-labeled data by taking the complex interactions of social media users and their social influence into account, this dissertation develops semi-supervised methods forstance mining.

Most existing studies on stance mining take a simplistic view that assumes a sentence (like a Tweet) holds a perspective that is independent of the context and the author's network position. This approach to stance learning leaves three crucial unresolved challenges. First, how do we train stance-learning models on new topics with minimal labeling effort? Discussion topics change fast and new issues emerge, making it difficult to reuse prior labeled data. However, artifacts of social networks like hashtags can give noisy signal about the stance of users. To extract the signal from noise, I develop methods to find useful hashtags by exploiting how users in the pro-group and the anti-group use popular hashtags. Second, how can we use multiple interaction modalities for stance mining? Users opinions are evident in different types of interactions, e.g. tweeting, retweeting or liking. I develop a semi-supervised method based on co-training that jointly trains multiple stance classifiers using different interaction modalities resulting in a better stance prediction model. Third, how to leverage users networks for stance prediction? The current approaches to stance learning ignore important network factors such as the interactions of social media users (e.g., a persons preference can also be known from his friends preferences). I use the network alignment as one of the training signals to train the stance classifiers.

My thesis brings a new direction to the stance learning problem that is grounded in social theory, is more amenable to analyzing activities on social media, and allows effective learning from multiple types of interactions without requiring large amounts of labeled data. By labeling only a few hashtags used in Twitter conversations on a few controversial topics, my approach allows for predicting both the stance of users (as in whether they are pro or con a topic) by over 80% accuracy and the stance in conversations (as in whether they favor or deny others posts) by over 70% accuracy.

162 pages

Thesis Committee:
Kathleen M. Carley (Chair)
Tom Mitchell
Louis-Philippe Morency
Huan Liu (Arizona State University)

James D. Herbsleb, Director, Institute for Software Research
Martial Hebert, Dean, School of Computer Science

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu