CMU-CS-22-123
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-22-123

Elevating Jupyter Notebook Maintenance Tooling
by Identifying and Extracting Notebook Structures

Yuan Jiang

M.S. Thesis

August 2022

CMU-CS-22-123.pdf


Keywords: Jupyter notebook, maintenance tooling, notebook structure, static analysis, data dependency, classification, navigation, version, alternative

Data analysis is an exploratory, interactive, and often collaborative process. Computational notebooks have become a popular tool to support this process, among others because of their ability to interleave code, narrative text, and results. The exploratory nature of computational notebooks allows their users to edit and execute parts of their program in any order. However, notebooks in practice are often criticized as hard to maintain and being of low code quality, including problems such as unused or duplicated code and out-of-order code execution. Data scientists can benefit from better tool support when maintaining and evolving notebooks. We argue that central to such tool support is identifying the structure of notebooks. We present a lightweight and accurate approach to extract notebook structure and outline several ways such structure can be used to improve maintenance tooling for notebooks, including navigation and finding common structural patterns. In addition, we investigate the history of notebooks and extend our approach to visualize how notebooks evolve over multiple revisions. We measure statistics of changed, added, and removed cells in Kaggle notebooks with history versions. Our formative study shows our visualizations can be useful for tracing and understanding changes in notebook evolution and identifying alternatives explored in specific stages of a data analysis pipeline over notebook histories.

43 pages

Thesis Committee:
Christian Kästner (Chair)
Eunsuk Kang
Shurui Zhou (University of Toronto)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu