|
CMU-CS-97-141
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-97-141
A Transactional Approach to Redundant Disk Array Implementation
William V. Courtright II
May 1997
Ph.D. Thesis (Department of Electrical and Computer Engineering)
CMU-CS-97-141.ps
CMU-CS-97-141.pdf
Keywords: Disk array, storage, architecture, simulation, directed
acyclic graph, software
Redundant disk arrays are a popular method of improving the
dependability and performance of disk storage and an ever-increasing
number of array architectures are being proposed to balance cost,
performance, and dependability. Despite their differences, there is a
great deal of commonality between these architectures; unfortunately,
it appears that current implementations are not able to effectively
exploit this commonality due to their ad hoc approach to error
recovery. Such techniques rely upon a case-by-case analysis of errors,
a manual process that is tedious and prone to mistakes. For each
distinct error scenario, a unique procedure is implemented to remove
the effects of the error and complete the affected
operation. Unfortunately, this form of recovery is not easily extended
because the analysis must be repeated as new array operations and
architectures are introduced.
Transaction-processing systems utilize logging techniques to mechanize
the process of recovering from errors. However, the expense of
guaranteeing that all operations can be undone from any point in their
execution is too expensive to satisfy the performance and resource
requirements of redundant disk arrays.
This dissertation describes a novel programming abstraction and
execution mechanism based upon transactions that simplifies
implementation. Disk array algorithms are modeled as directed acyclic
graphs: the nodes are actions such as "XOR" and the arcs represent
data and control dependencies between them. Using this abstraction, we
implemented eight array architectures in RAIDframe, a framework for
prototyping disk arrays. Code reuse was consistently above 90%. The
additional layers of abstraction did not affect the response time and
throughput characteristics of RAIDframe; however, RAIDframe consumes
60% more CPU cycles than a hand-crafted non-redundant implementation.
RAIDframe employs roll-away error recovery, a novel scheme for
mechanizing the execution of disk array algorithms without requiring
that all actions be undoable. A barrier is inserted into each
algorithm: failures prior to the barrier result in rollback, relying
upon undo information. Once the barrier is crossed, the algorithm
rolls forward to completion, and undo records are unnecessary. Experiments
revealed this approach to have identical performance to that of
non-logging schemes.
253 pages
|