|
CMU-S3D-26-105 Software and Societal Systems Department School of Computer Science, Carnegie Mellon University
Supporting the Sustainable Use of Open Source Software Courtney Elta Miller April 2026
Ph.D. Thesis
In this dissertation, I study how to support and improve the processes used by developers facing open source dependency abandonment disruptions. Open source software forms the digital infrastructure that most modern software is built on, and expectations regarding ongoing maintenance are a widespread norm despite the reality that many open source packages become abandoned, even widely-used ones. Package abandonment can disrupt supply chain integrity for millions of downstream users and increase software supply chain attack surfaces. These disruptions threaten the stability of critical digital infrastructure, including healthcare systems, financial services, and transportation networks. Addressing these disruptions is essential for ensuring software reliability, security, and the resilience of our broader digital economy. While this discrepancy between the expectations placed on and the reality of open source has fueled the need to study and improve open source sustainability, the majority of that research has focused on attempting to prevent abandonment disruptions. My dissertation takes a different approach: I shift the focus away from maintainers and study abandonment disruptions from the user perspective with the goal of enabling the sustainable use of open source digital infrastructure by helping users better navigate abandonment disruptions when they occur. I leverage a three-step methodological approach to explore, measure, and improve how developers navigate abandonment disruptions. Throughout these three steps, representing the three core chapters of this dissertation, I develop rigorous multi-dimensional empirical mixed-method approaches, combining human-centered qualitative techniques with large-scale data-driven statistical analysis, modeling, and visualization. By systematically studying abandonment as a disruption, I reveal developer challenges and inform the design of targeted solutions. More specifically, I begin by understanding abandonment disruptions through the first two steps. First, I explore how developers currently deal with abandonment disruptions by going straight to the source and interviewing developers who have faced open source dependency abandonment. I contextualize and curate their experiences, how they deal with abandonment, and both the key challenges they face as well as potential solutions. Additionally, I present a theoretical framework on the cost of dependency abandonment, introduce the concept of community-oriented solutions, and provide evidence-based strategies from fields like social psychology and game theory to overcome the volunteer's dilemma and encourage collective responses. Then, to measure the scale of abandonment disruptions and the current state of user response in practice, I perform a large-scale quantitative analysis measuring the prevalence of and response to abandonment across the JavaScript npm ecosystem. I employ a series of statistical modeling techniques to quantify the impact of various factors on the likelihood and speed of downstream user response to abandonment, such as providing explicit notice to users of the abandonment. Revealing that while downstream response is uncommon, increasing information transparency surrounding abandonment can help support and accelerate downstream responses, aligning with the challenges surrounding identifying abandonment described by developers in the first explore step. Through the first two steps, I demonstrate that abandonment is an under-supported and understudied disruption that many developers struggle both to identify and respond to, given the current lack of tooling and guidance. In the third step, I help improve how developers navigate abandonment disruptions by designing an intervention in the form of a prototype tool for automatically identifying dependency abandonment. However, I also learned that learned that not all dependency abandonment is equally concerning to developers; instead, many are primarily concerned about abandonment they believe would be impactful to their project. And research on existing software component analysis (SCA) tools for related dependency management practices e.g., updates and vulnerability patches, illustrates that a key usability issue limiting the effectiveness of these tools is overwhelming developers with too many notifications, particularly ones deemed irrelevant, which can frustrate developers and lead to tool disengagement. With this key limitation in mind, I developed a prototype tool to support the automated identification of abandoned dependencies without overwhelming developers, by only notifying developers about dependency abandonment that is likely impactful and therefore noteworthy to their project. My key questions became (1) what abandonment will be impactful to a particular project given the context of their dependency usage; and (2) how to make such predictions at scale. I hypothesize and later demonstrate that our approach using large language models (LLMs), equipped with theory-driven reasoning and context-specific information, can accurately predict the impact of abandonment better than LLMs alone to support developer decision making. I conducted need-finding interviews with 22 developers to develop a theoretical understanding of what factors influence the impactfulness of a dependency's abandonment on a project given the context of their dependency usage. I then leveraged this theory to develop an LLM-based classifier to predict the project-specific impact of abandonment using theory-driven reasoning and context-specific information. Finally, through an independent evaluation with 124 developers, I demonstrate that our classifier is effective at predicting project-specific impactfulness as well as the promise of this method for creating tooling with intelligent defaults in contexts where traditional tooling approaches have failed and theory or design work is still essential. My dissertation work focuses on helping developers navigate abandonment disruptions, but my three step explore-measure-improve approach to studying disruptions has much wider applicability. For example, I will study other sociotechnical disruptions, such as the integration of Generative AI tools into development workflows, which I am currently working on, as well as the unanticipated disruptions of tomorrow.
105 pages
Nicolas Christin, Head, Software and Societal Systems Department
|
|
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |
|