Computer Science Department
School of Computer Science, Carnegie Mellon University
A Distributed and Scalable Peer-to-Peer
In this thesis, we present the design, implementation and evaluation of a distributed CDS that meets the scalability and searchability requirements simultaneously. Nodes in the CDS organize themselves into a structured P2P overlay network in a distributed fashion using a DHT. The CDS supports the search of highly dynamic contents represented using a descriptive attribute-value based naming scheme. We ensure scalability by deploying efficient registration and query algorithms using Rendezvous Points (RPs). To maintain high system throughput under realistic skewed load, such as flash crowds, we designed a novel mechanism that uses Load Balancing Matrices (LBMs) to eliminate hot-spots by balancing both registration and query load in a fully distributed fashion. To support complex queries, we developed two new distributed data structures, namely the Range Search Tree and Distributed KD-Tree. These data structures coupled with a set of lightweight tree-based protocols, allow the CDS to support range and similarity queries efficiently without creating bottlenecks.
To evaluate the CDS, we implemented a comprehensive simulator. Our extensive simulation results confirmed the effectiveness of the CDS design. In addition, we implemented a prototype of the CDS that includes the RP-based registration and query mechanisms with distributed load balancing. Our deployment of the prototype on the Internet (Planet Lab testbed) and its integration with a content-based music information retrieval system verified the system's feasibility and its applicability to a large category of real world applications.