CMU-CS-25-152
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-25-152

Surprising Interactions Between Filters In Equi-Join Processing

Mihir Khare

M.S. Thesis

December 2025

CMU-CS-25-152.pdf


Keywords: Database management systems, join pre-filtering, robust query processing, adaptive query processing, bloom filters

The current era of data storage is defined by the widespread adoption of data lakes, and the disaggregation of storage and compute hardware. Modern database management systems (DBMSs) are often operating on large volumes of data stored in object stores (like Amazon's S3), open file formats (like Apache's Parquet), orotherwise have outdated or nonexistent statistics. In join-heavy analytical workloads, the traditional approach of optimizing query plans to minimize the cost of joins breaks down if the available information to estimate cardinalities and costs is inaccurate. In recent years, a class of techniques known as "join pre-filtering" has gained focus as an attempt to reduce the reliance on a good optimizer for minimizing join costs by reducing the inputs of joins to the minimum set of tuples needed to produce the output. This thesis explores the current state-of-the-art in pre-filtering, and concludes that more work must be done to create a strategy that is applicable across a wide range of workloads. First, we provide an overview of the fundamental concepts of pre-filtering, and describe the key design decisions of and differences between currently studied techniques. We then evaluate two pre-filtering methods, min-max filters and Bloom filters, using a modified implementation of Dynamic Predicate Transfer (RPT+), a leading contemporary technique for join pre-filtering. Our analysis focuses on the performance interactions between these filter types. Finally, we discuss the implications of the results for modern DBMSs, and future directions of study in this area.

49 pages

Thesis Committee:
Jignesh Patel (Chair)
Andrew Pavlo

Jignesh Patel, Interim Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: School of Computer Science

This page maintained by reports@cs.cmu.edu