Computer Science Department
School of Computer Science, Carnegie Mellon University
Dataflow Analysis-Based Dynamic Monitoring
Michelle Leah Goodstein
Despite the best efforts of programmers and programming systems researchers, software bugs continue to be problematic. This thesis will focus on a new framework for performing dynamic (runtime) parallel program analysis to detect bugs and security exploits. Existing dynamic analysis tools have focused on monitoring sequential programs. Parallel programs are susceptible to a wider range of possible errors than sequential programs, making them even more in need of online monitoring. Unfortunately, monitoring parallel applications is difficult due to inter-thread data dependences and relaxed memory consistency models.
This thesis presents dataflow analysis-based dynamic parallel monitoring, a novel softwarebased parallel analysis framework that avoids relying on strong consistency models or detailed inter-thread dependence tracking. Using insights from dataflow analysis, our frameworks enable parallel applications to be monitored concurrently without capturing a total order of application instructions across parallel threads. This thesis has three major contributions: Butterfly Analysis and Chrysalis Analysis, as well as extensions to both enabling explicit tracking of uncertainty.
Butterfly Analysis is the first dataflow analysis-based dynamic parallel monitoring framework. Unlike existing tools, which frequently assumed sequential consistency and/or access to a total order of application events, Butterfly Analysis does not rely on strong consistency models or detailed inter-thread dependence tracking. Instead, we only assume that events in the distant past on all threads have become visible; we make no assumptions on (and avoid the overheads of tracking) the relative ordering of more recent events on other threads. To overcome the potential state explosion of considering all the possible orderings among recent events, we adapt two v techniques from static dataflow analysis, reaching definitions and available expressions, to this new domain of dynamic parallel monitoring. Significant modifications to these techniques are proposed to ensure the correctness and efficiency of our approach. We show how our adapted analysis can be used in two popular memory and security tools. We prove that our approach does not miss errors, and sacrifices precision only due to the lack of a relative ordering among recent events.
While Butterfly Analysis offers many advantages, it ignored one key source of ordering information which significantly affected its false positive rate: explicit software synchronization, and the corresponding high-level happens-before arcs. This led to the development of Chrysalis Analysis, which generalizes the Butterfly Analysis framework to incorporate explicit happensbefore arcs resulting from high-level synchronization within a monitored program. We show how to adapt two standard dataflow analysis techniques and two memory and security lifeguards to Chrysalis Analysis, using novel techniques for dealing with the many complexities introduced by happens-before arcs. Our security tool implementation shows that Chrysalis Analysis matches the key advantages of Butterfly Analysis while significantly reducing the number of false positives, by an average of 97%.
While Chrysalis Analysis greatly improved upon Butterfly Analysis' precision, it was unable to separate potential errors due to analysis uncertainty from true errors. We extend both Butterfly Analysis and Chrysalis Analysis to incorporate an uncertain state into their metadata lattice, and provide new guarantees that true error states are now precise. We experimentally evaluate a prototype and demonstrate that we effectively isolate analysis uncertainty. We also explore possible dynamic adaptations to the presence of uncertainty.
In all cases, we have shown that our frameworks are provably guaranteed to never miss an error and sacrifice precision only due to the lack of a relative ordering among recent events, and present experimental results on performance and precision from our implementations.