|
CMU-CS-01-128
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-01-128
Locality in Search Engine Queries and Its Implications for Caching
Yinglian Xie, David O'Hallaron
May 2001
CMU-CS-01-128.ps
CMU-CS-01-128.pdf
Keywords: Search engines, query locality, caching, trace
analysis, user lexicon, prefetching
Caching is a popular technique for reducing both server load and user
response time in distributed systems. In this paper, we are interested
in the question of whether caching might be effective for search
engines as well. We study two real search engine traces by examining
query locality and its implications for caching. Our trace analysis
results show that: (1) Queries have significant locality, with query
frequency following a Zipf distribution. Very popular queries are
shared among different users and can be cached at servers or proxies,
while 16% to 22% of the queries are from the same users and should be
cached at the user side. Multiple-word queries are shared less and
should be cached mainly at the user side. (2) If caching is to be done
at the user side, short-term caching for hours will be enough to cover
query temporal locality, while server/proxy caching should be based on
longer periods such as days. (3) Most users have small lexicons when
submitting queries. Frequent users who submit many search requests
tend to reuse a small subset of words to form queries. Thus, with
proxy or user side caching, prefetching based on user lexicon looks
promising.
21 pages
|