Human-Computer Interaction Institute
School of Computer Science, Carnegie Mellon University


Designing Effective History Support
for Exploratory Programming Data Work

Mary Beth Kery

August 2021

Ph.D. Thesis


Keywords: NA

Why did you model the data that way? How do we reproduce this plot? Programming for data science or modeling is a highly valued skill today. Yet when data workers experiment with data by coding – an intensely iterative process called exploratory programming – the details of what they try along the way to a solution tend to get lost. Since experimentation underlies essential workflows in data analysis, machine learning, AI, and visualization this is a serious flaw. Ask any data worker today, and regardless of organization or years of experience, they have faced at least some results that cannot be readily reproduced, or mysterious data decisions missing a rationale. Modern best practices for managing experimentation take high human e. ort and still leave considerable room for error. With rising demand for responsibility and accountability of analyses and models, it is vital that people have proper support for documenting and answering why things were built the way they were.

This dissertation explores history tooling to support exploratory programming data work. To do this, we first conducted interviews, surveys, and design exercises with practitioners to learn about their needs and current workflows for experimenting today. We contribute two studies: 1) a study detailing the mix of tools and ad-hoc methods data workers use to manage their experiments, and 2) an investigation of how data workers use computational notebooks for iteration. Our results point to two key barriers: the manual effort needed to collect experiment history today is unsustainable, and recovering semantic process information out of a pile of history logs is far too cumbersome for practitioners to fit into their workflows today. We aim to help practitioners record their experimentation without any manual effort, and moreover, quickly recover history facts to answer rationale questions about their work.

Next in this dissertation, we design, build, and test new interactive tools to meet these design goals, over a 5 year iterative human-centered design process. We contribute: 1) a series of 5 experiment history tool prototypes and 4 usability studies with practitioners, each of which illuminates a different aspect of the design space, 2) a set of novel visualization and interaction techniques for concisely summarizing history, 3) a fully implemented experiment history tool called Verdant, deployed in the wild as a computational notebook extension, and 4) an observational study where data workers use Verdant during exploratory programming and afterwards to answer rationale questions about the history of their experiments. With Verdant, participants were able to answer 98% of history questions about their work in 1 minutes 26 seconds on average. All participants reported ways in which Verdant's style of history support would help in their own real life work practices. In the conclusions of this thesis we discuss the broader design space of experiment support tooling that rich history data enables.

307 pages

Thesis Committee:
Brad A, Myers (Chair)
Nikolas Martelaro
Bonnie E. John (Bloomberg LP)
Dominik Moritz (HCII/ Apple Inc)

Jodi Forlizzi, Head, Human-Computer Interaction Institute
Martial Hebert, Dean, School of Computer Science

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by