Computer Science Department
School of Computer Science, Carnegie Mellon University
Data Mining on an OLTP System (Nearly) for Free
Erik Riedel, Christos Faloutsos, Gregory Ganger, David Nagle
June 1999
Keywords: Input/output devices, databse applications, special-purpose
and application-based systems, input/output and data communications
This paper proposes a scheme for scheduling disk requests that takes
advantage of the ability of high-level functions to operate directly
at individual disk drives. We show that such a scheme makes it
possible to support a Data Mining workload on an OLTP system almost
for free: there is only a small impact on the throughput and response
time of the existing workload. Specifically, we show that an OLTP
system has the disk resources to provide a consistent one third of its
sequential bandwidth to a background Data Mining task with close to
zero impact on OLTP throughput and response time at high transaction
loads. At low transaction loads, we show much lower impact than
observed in previous work. This means that a production OLTP system
can be used for Data Mining tasks without the expense of a second
dedicated system. Our scheme takes advantage of close interaction with
the on-disk scheduler by reading blocks for the Data Mining workload
as the disk head "passes over" them while satisfying demand blocks
from the OLTP request stream. We show that this scheme provides a
consistent level of throughput for the background workload even at
very high foreground loads. Such a scheme is of most benefit in
combination with an Active Disk environment that allows the background
Data Mining application to also take advantage of the processing power
and memory available directly on the disk drives.