CMU-ISR-21-101
Institute for Software Research
School of Computer Science, Carnegie Mellon University



CMU-ISR-21-101

Improving Patch Quality by Enhancing Key
Components of Automatic Program Repair

Mauricio Soto

February 2021

Ph.D. Thesis
Software Engineering

CMU-ISR-21-101.pdf


Keywords: Automatic Program Repair, APR, Patch Quality, Test Suites, Mutation Operators, Diversity

The error repair process in software systems is, historically, a resource-consuming task that relies heavily on manual developer effort. Automatic program repair approaches have enabled the repair of software with minimum human interaction mitigating the burden on developers, reducing the costs of manual debugging and increasing software quality.

However, a fundamental problem current automatic program repair approaches suffer is the possibility of generating low-quality patches that overfit to one program specification as described by the guiding test suite andn otg eneralizing to the intended specification.

This dissertation rigorously explores this phenomenon on real-world Java programs and describes a set of mechanisms to enhance key components of the automatic program repair process to generate higher quality patches. These mechanisms include an analysis of test suite behavior and their key characteristics for automatic program repair. We analyze the effectiveness of three well-known repair techniques: GenProg, PAR, and TrpAutoRepair, on defects made by the projects' developers during their regular development process, and modify and analyze the impact modifying characteristics such as size, coverage, provenance, and number of failing test cases has on the quality of the produced patches.

A second mechanism toward increase patch quality describes a set of research questions aimed at analyzing developer code changes to inform the mutation operator selection distribution. We create a probabilistic model that describes how often human developers choose each of the different mutation operators available to automated repair techniques, and we later use this probabilistic model to create an APR approach informed by this distribution to generate higher quality patches.

Finally, the third mechanism describes a repair technique based on patch diversity as a means increase the quality of the best performing patch in a patch population, and an evaluation of patch consolidation as a mechanism to increase patch quality. Some of the main findings in this dissertation are:

  • Using our open-source framework JaRFly we were able to generate 68 patches for the 357 analyzed defects.
  • Fundamental test suite characteristics such as test suite coverage, size, provenance, and number of triggering test cases determine the quality of the resulting plausible patches generated by automated program repair.
  • An automatic program repair technique informed in human-based mutation operator distribution increases the quality of the patches generated when compared to other APR techniques.
  • We analyze how current APR approaches typically lack diversity in their generated patches. We propose and evaluate a set of diversity-driven techniques that lead to an increase in semantic diversity of the patch pool and an increase in the best performing patch of the patch population. Finally, we analyze how patch consolidation can be used to increase patch quality.

134 pages

Thesis Committee:
Claire Le Goues (Chair)
Christian Kä stner
William Klieber (Software Engineering Institute)
David C. Shepherd (Virginia Commonwealth University)

James D. Herbsleb, Director, Institute for Software Research
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu