CMU-ISR-22-108 Institute for Software Research School of Computer Science, Carnegie Mellon University
Leveraging Stances in Conversations for the Ramon Villa-Cox July 2022
Ph.D. Thesis
There is currently an ongoing policy discussion regarding the impact of the ob- served polarization online, how it affects the spread of false information, and what if anything should be done to curtail it. To design and implement effective and efficient interventions in this area, requires a detailed understanding of how the members of polarized communities interact with each other and with outsiders holding opposing views. The focus of this dissertation is the study of polarized Twitter communities, and the spread of disinformation through them, during contentious events. Due to its large number of users, Twitter has become one of the primary social media platforms for acquiring, sharing, and spreading information. However, it has also become a source for misinformation spread and polarization. The effect might not be crucial when the subject in hand is a trivial one, however, during globally concerning events, it gains an undeniable importance. A significant amount of research on information diffusion through this medium has focused on retweeting, despite it being only one potential reaction to information found on Twitter. As shown in this work, this can be misleading, particularly when characterizing the spread of disinformation or when identifying polarized communities. To address this, we explore two different subareas of the identification of stance in Twitter conversations, one that seeks to identify a user's stance towards a pre-defined target (target stance classification) and one that focuses on the stance to messages from other users (conversation stance classification). Analyzing such conversations is difficult and requires complex natural language processing models that often rely on copious amounts of labeled data. These issues are amplified when working in languages other than English, as labeled resources are scarcer. In the pursuit of the objectives set forth in this work, we developed a weak-labeling methodology for target stance detection which requires minimal labeling effort and constructed one of the first labeled datasets in Spanish for the identification of stance in conversations. This dataset was constructed seeking to provide a unified benchmark for the detection of both polarized online discussions and rumors. These resources are then used for the development of state-of-the-art stance classifiers to explore polarized Twitter communities during a major political event that shocked the South American Region at the end of 2019. For example, results show that a user's tendency to share information consistent to their views of the government is not consistent to the "filter bubble" explanation for polarization. That is, we show that users from both sides of the ideological spectrum actively engaged with each other (mostly negatively). This implies that the observed phenomenon is more consistent with polarized social media practices consistent with confirmation bias on part of the users.
128 pages
James D. Herbsleb, Director, Institute for Software Research
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |