CMU-CS-22-109
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-22-109

Static Analysis of Probabilistic Programs:
An Algebraic Approach

Di Wang

Ph.D. Thesis

May 2022

CMU-CS-22-109.pdf


Keywords: Probabilistic programs, Program analysis, Denotational semantics, Markov algebras, Control-flow hyper-graphs

Probabilistic programs are programs that can draw random samples from probability distributions and involve random control flows. They are becoming increasingly popular and have been applied in many areas such as algorithm design, cryptographic protocols, uncertainty modeling, and statistical inference. Formal reasoning about probabilistic programs comes with unique challenges, because it is usually not tractable to obtain the exact result distributions of probabilistic programs. This thesis focuses on an algebraic approach for static analysis of probabilistic programs. The thesis first provides a brief background on measure theory and introduces an imperative arithmetic probabilistic programming language Appl with a novel hyper-graph program model. Second, the thesis presents an algebraic denotational semantics for Appl that can be instantiated with different models of nondeterminism. The thesis also develops a new model of nondeterminism that involves nondeterminacy among state transformers and presents a domain-theoretic characterization of the new model. Based on the algebraic denotational semantics, the thesis proposes a general algebraic framework PMAF for designing, implementing, and proving the correctness of static analyses of probabilistic programs. The thesis also includes a concrete static analysis–central-moment analysis for cost accumulators in probabilistic programs–and elaborates implementation strategies to improve the usability and efficiency of the analysis. There is a gap between the general PMAF framework and the central-moment analysis, in the sense that the former is based on abstraction and iterative approximation, but the latter is based on constraint solving. The thesis provides some preliminary results on bridging the gap, via the development of novel regular hyper-path expressions, which finitely represent possibly-infinite hyper-paths on control-flow hyper-graphs of probabilistic programs without nondeterminism, and DMKAT algebraic structures, which can be used to interpret regular hyper-path expressions. Future directions for extending the research covered by this thesis include developing an algebraic static-analysis framework based on DMKAT and instantiating it to perform central-moment analysis, generalizing DMKAT with support for nondeterminism and formalizing an equational axiomatization for the generalized algebraic framework, as well as applying the analysis framework developed in the thesis to the analysis and verification of statistical guarantees for Bayesian probabilistic programming.

185 pages

Thesis Committee:
Jan Joffmann (Chair)
Matt Fredrikson
Stephen Brookes
Thomas Reps (University of Wisconsin-Madison)
Hongfei Fu (Shanghai Jiao Tong University)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu