CMU-CS-20-143
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-20-143

Filter representation in Vectorized Query Execution

Amadou Latyr Ngom

M.S. Thesis

July 2020

CMU-CS-20-143.pdf


Keywords: Vectorization, Compilation, Filter, SIMD

Advances in memory capacity have allowed Database Management Systems (DBMSs) to store large amounts of data in memory, there by shifting the performance bottleneck of query execution from disk accesses to CPU efficiency (i.e., instruction count and cycles per instruction). One technique used to achieve such efficiency in analytical applications is batch-oriented processing or vectorization: it reduces interpretation overhead, improves cache locality, and allows for efficient loop optimizations (e.g., loop unrolling, SIMD vectorization). For each vector (i.e., a batch of tuples),the DBMS needs to track the set of valid tuples after a predicate application. To that end, systems employ one of two data structures, or filter representations: Selection Vectors (SelVecs) or Bitmaps. In this work, we analyze each approach's strengths and weaknesses to provide recommendations on how to implement vectorized operations. Through a wide range of micro-benchmarks, we determine that the optimal implementation strategy is a function of many factors: the cost of iterating through tuples, the cost of the operation itself, and how amenable it is to SIMD vectorization. Our analysis shows that Bitmaps perform better for operations that can be efficiently vectorized using SIMD instructions but that SelVecsper form better on all other operations due to a cheaper iteration logintc.

54 pages

Thesis Committee:
Andy Pavlo (Chair)
Todd C. Mowry

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu