CMU-ISR-20-103
Institute for Software Research
School of Computer Science, Carnegie Mellon University



CMU-ISR-20-103

Shurui Zhou

May 2020

Ph.D. Thesis
Software Engineering

CMU-ISR-20-103.pdf


Keywords: Collaborative Software Development, Distributed Collaboration, Fork-Based Development, Social Coding, GitHub, Open-Source

Fork-based development is a lightweight mechanism that allows developers to collaborate with or without explicit coordination. Recent advances in distributed version control systems (e.g., 'git') and social coding platforms (e.g., GitHub) have made fork-based development relatively easy and popular by providing support for tracking changes across multiple forks with a common vocabulary and mechanism for integrating changes back. However, fork-based development has well-known downsides. When developers each create their own fork and develop independently, their contributions are usually not easily visible to others, unless they make an active attempt to merging their changes back into the original project. When the number of forks grows, it becomes very difficult to keep track of decentralized development activity in many forks. The key problem is that it is difficult to maintain an overview of what happens in individual forks and thus of the project's scope and direction. Furthermore, the problem of lacking an overview of forks can lead to several additional problems and inefficient practices: lost contributions, redundant development, fragmented communities , and so on.

In this dissertation, I mixed a wide range of research methods to understand the problem space and the solution space. Specifically, I first design measures to quantify how serious are these inefficiencies, then Id eveloped two complementary strategies to alleviate the problem: First, during the process of sampling 1311 GitHub projects and quantifying the inefficiencies, also by opportunistically reaching out to developers who have used forks, I recognized that there are differences among projects. Therefore, I identified existing best practices and suggesting evidence based interventions for projects that are ineffifcient. Moreover, I observed that the notion of forking has changed since the invention of fork-based development, so I conducted mixed-method experiment to understand the perception of forking by interviewing developers and identified future research directions. Second, as we found that the lack of an overview problem that we observed in fork-based development environment is essentially the same as the lack of awareness problem that have been studied previously in other distributed software development scenarios but with new challenges, I designed awareness tool to improve the awareness in the fork-based development environment and help developers to detect redundant development to reduce developers' unnecessary effort. To evaluate the effectiveness and usefulness of these awareness tools, I conducted both quantitative and qualitative studies.

My dissertation work focuses on improving collaboration efficiency for distributed software teams, but the research method has a lot wider applicability. For example, in the future, I will study other forms of collaboration, such as the collaboration of interdisciplinary software teams.

134 pages

Thesis Committee:
Christian Kästner (Chair)
James D. Herbsleb
Laura A. Dabbish
Andrezej Wasowski (IT University of Cophenhagen

James D. Herbsleb, Director, Institute for Software Research
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu