|
CMU-S3D-25-113 Software and Societal Systems Department School of Computer Science, Carnegie Mellon University
Automated API Refactoring for Evolving Codebases Daniel Rosa Ramos August 2025
Ph.D. Thesis
Modern software development depends heavily on third-party libraries and frameworks, which expose their functionality through APIs and bring substantial productivity gains. However, as libraries evolve to meet new technical or market demands, clients must often adapt their code to accommodate breaking changes or even newer libraries. This form of software maintenance, known as API refactoring, is a time-consuming and error-prone task, which has led to significant interest in automating it. A common approach to automating API refactoring is to mine historical data from client repositories to extract match-replace rules. However, these approaches are limited by the availability of high-quality examples: many clients do not refactor in public, and those that do leave insufficient traces to learn from. This thesis presents a set of alternative methods for learning API migration rules without requiring large-scale mining of client code. Instead, we explore three complementary sources of information: documentation, the API development process, and natural language. First, we use API documentation to infer mappings between old and new APIs, which guide the synthesis of migration scripts. Second, we extract migration knowledge from the evolution of the library itself, especially from pull requests that introduce breaking changes and update internal tests. Finally, we show that large language models trained on natural language artifacts can be used to generate migration examples, which are then validated and generalized into reusable scripts. We operationalize these ideas in four refactoring tools, each targeting a different aspect of the problem. These tools combine program synthesis with machine learning to synthesize and apply migrations automatically. We evaluated our techniques in real-world Python libraries and synthetic benchmarks, showing that it is possible to automate migration effectively using only indirect sources of information, without requiring curated datasets or repository mining.
135 pages
Nicolas Christin, Head, Software and Societal Systems Department
Creative Commons License: CC-BY-NC (Attribution-Non-Commerical)
|
|
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |
|