CMU-CS-25-141
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-25-141

Understanding Representations of Humans in Generative Image
Modeling Through Discrete Counterfactual Prompt Optimization

Joshua Nathaniel Williams

Ph.D. Thesis

September 2025

CMU-CS-25-141.pdf


Keywords: Generative Image Modeling, Prompt Optimization, Explainability, Discrete Optimization

Text-to-image (T2I) models are a common, publicly accessible class of generative model. Due to their widespread use, it is crucial to develop tools and methods that allow us to better understand how these models decide to represent their subjects, particularly human subjects. By comparing generated images across sets of carefully constructed prompts, we may uncover patterns in how these models represent various groups of people. These analyses often show specific prompts that elicit representational asymmetries, such as the prompt: "A person with glasses." being more likely to generate a male-presenting person than female-presenting.

While many such patterns are innocuous, some harmful representational biases emerge that require an intervention by developers. These approaches that rely on predefined prompt templates or fixed identity categories are effective for benchmarking known issues, yet they may unintentionally create blind spots shaped by the researchers' own background and experience. While one person's life experiences may lead them to expect (and therefore design experiments to evaluate) specific representations by the model, another person may expect a completely different set of representations and harms that the former would not consider these differences in experience result in a wide range of potential blindspots in safety evaluations.

This thesis develops a variety of approaches, grounded in counterfactual and contrastive analyses, that act as general tools for surfacing new hypotheses related to representational asymmetries and harms in generative modeling that address these blindspots and complement existing evaluations. We first demonstrate that effective explanations for simple classifiers requires incorporating knowledge of the underlying ground-truth data distribution, without which, explanations and discoveries are prone to spurious insights. We posit a simple change to the implicit graphical model that underlies counterfactual explainability and propose a new metric that explicitly incorporates this distributional awareness.

The insights from this method then guides our approach to counterfactual explainability methods in the T2I setting. By reviewing a variety of discrete prompt optimization methods, we show how to define and encode this distributional awareness of captioned data in the optimization process. We support these methods by introducing an approach for multiobjective optimization across multiple language models, each with discrete tokenizers and text embeddings. Using the insights andmethods developed throughout this thesis, we conclude by presenting an unsupervised strategy for discovering candidate prompts that encode representational asym metries, many of which have not yet been discussed in the broader literature. Understanding and relating the learned speech and writing patterns of generative models to their outputs, allows to better understand why models represent people the way that they do and improves our ability to target specific behaviors as we train and evaluate generative models.

130 pages

Thesis Committee:
J. Zico Kolter (Chair)
Hoda Heidari
Aditi Raghunathan
Sarah Laszlo (Visa)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Creative Commons: CC-BY (Attribution)


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu