|
CMU-CS-25-124
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-25-124
Analyzing Novice Debugging Behavior Using
Programming Process Data
Archan Das
M.S. Thesis
August 2025
CMU-CS-25-124.pdf
Keywords:
CS Education, debugging, software engineering, process data
Background. Debugging is an important part of the software development work-
flow. In order to improve the techniques and instruction of debugging, we need to understand the cognitive process through which programmers debug. Previous research has used a variety of methodologies for studying the debugging process, including concurrent verbal protocols, quantitative analyses, and neural imaging. This research has established the sequence of cognitive phases that programmers progress through while debugging. One frontier in this research is the use of process data to study debugging. This process data consists of logs collected from integrated development environments (IDEs) that record the process by which programmers work on code.
Aim. We aim to: a) create a framework for analyzing process data captured from an IDE, b) analyze process data collected from a population of introductory programming students to observe patterns in student debugging behavior, and c) use the collected data to identify efficient and inefficient habits exhibited by students while debugging.
Data. We collected process data across three exercises from 315 students in an introductory programming class. This data consists of an event log of every keystroke, code execution, and submission attempt students made while working on their exercises.
Methods. We extracted a timeline of cognitive phases from the process data for each student and validated our model with a panel of experts. We tested the effect of three behavioral features (use of print statements, time in locate-error phase, and functional edits per cycle) against a novel measure of student efficiency in debugging (count of run-program events), but found results to be inconclusive. We also observed patterns across the subject population of our extracted cognitive phases.
Results. We found that the frequency of print statements had a positive correlation with debugging struggle across all exercises. Increased time spent in locate-error phase had a statistically significant impact on student debugging struggle in some exercises, but not others. Subjects tended to perform faster and more focused repairs to their code later in debugging episodes. Finally, debugging struggle had a weak negative correlation with average exam scores in the course.
Conclusion. Results suggest that students should be encouraged to spend more time reasoning about their code while debugging. Process data also shows promise as a tool for evaluating and giving feedback on the student debugging process at scale. In addition, our framework can be widely useful for experiments on student debugging behavior, especially when large subject populations make alternative methods difficult.
67 pages
Thesis Committee:
Mark Stehlik (Co-Advisor)
David Kosbie (Co-Advisor)
Roy Maxion (Subject Matter Expert)
Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science
|