CMU-CS-12-116
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-12-116

Hybrid Fuzz Testing: Discovering Software Bugs
via Fuzzing and Symbolic Execution

Brian S. Pak

May 2012

M.S. Thesis

CMU-CS-12-116.pdf


Keywords: Software, Security, Program Testing, Hybrid Fuzzing, Input Generation

Random mutational fuzz testing (fuzzing) and symbolic executions are program testing techniques that have been gaining popularity in the security research community. Fuzzing finds bugs in a target programby natively executing it with random inputs while monitoring the execution for abnormal behaviors such as crashes. While fuzzing may have a reputation of being able to explore deep into a program's state space efficiently, naïve fuzzers usually have limited code coverage for typical programs since unconstrained random inputs are unlikely to drive the execution down many different paths. In contrast, symbolic execution tests a program by treating the program's input as symbols and interpreting the program over such symbolic inputs. Although in theory symbolic execution is guaranteed to be effective in achieving code coverage if we explore all possible paths, this generally requires exponential resource and is thus not practical for many real-world programs.

This thesis presents our attempt to attain the best of both worlds by combining fuzzing with symbolic execution in a novel manner. Our technique, called hybrid fuzzing, first uses symbolic execution to discover frontier nodes that represent unique paths in the program. After collecting as many frontier nodes as possible under a user-specifiable resource constraint, it transits to fuzz the program with preconditioned random inputs, which are provably random inputs that respect the path predicate leading to each frontier node. Our current implementation supports programs with linear path predicates and can automatically generate preconditioned random inputs from a polytope model of the input space extracted from binaries. These preconditioned random inputs can then be used with any fuzzer. Experiments show that our implementation is efficient in both time and space, and the inputs generated by it are able to gain extra breadth and depth over previous approaches.

80 pages



Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu