COMPUTER SCIENCE TECHNICAL REPORT ABSTRACTS

CMU-CS-22-121
Computer Science Department
School of Computer Science, Carnegie Mellon University

CMU-CS-22-121

(Un)Fairness Along the AI Pipeline
Problems and Solutions

Emily Black

Ph.D. Thesis

July 2022

Keywords: Artificial Intelligence, Machine Learning, Deep Networks, Neural Networks, Deep Learning, Ethics, Fairness, Accountability, Explainability, Public Policy, machine learning pipeline, AI pipeline, stability, consistency, inconsistency, ensembling, counterfactual explanations, leave-one-out unfairness, tax policy, vertical equity, model multiplicity

Artificial Intelligence (AI) systems now influence decisions impacting every aspect of people's lives, from the news articles they read, to whether or not they receive a loan. While the use of AI may lead to great accuracy and efficiency in the making of these important decisions, recent news and research reports have shown that AI models can act unfairly: from exhibiting gender bias in hiring models, to racial bias in recidivism prediction systems.

This thesis explores new methods for understanding and mitigating fairness issues in AI through considering how choices made throughout the process of creating an AI system–i.e., the modeling pipeline–impacts fairness behavior. First, I will show how considering a model's end-to-end pipeline allows us to expand our understanding of unfair model behavior. In particular, my work introduces a connection between AI system stability and fairness by demonstrating how instability in certain parts of the modeling pipeline, namely the learning rule, can lead to unfairness by having important decisions rely on arbitrary modeling choices.

Secondly, I will discuss how considering ML pipelines can help us expand our toolbox of bias mitigation techniques. In a case study investigating equity with respect to income in tax auditing practices, I will demonstrate how interventions made along the AI creation pipeline–even those not related to fairness on their face–can not only be effective for increasing fairness, but can often reduce tradeoffs between predictive utility and fairness.

Finally, I will close with an overview of the benefits and dangers of the flexibility that the AI modeling pipeline affords practitioners in the creation of their models, including a discussion of the the legal repercussions of this flexibility, which I call model multiplicity.

188 pages

Thesis Committee:
Matt Fredrikson (Chair)
Alexandra Chouldechova
Rayid Ghani
Hoda Heidari
Solon Barocas (Microsoft Research)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu