DHTs over peer clusters for distributed information retrieval

Odysseas Papapetrou, Wolf Siberski, Wolf Tilo Balke, Wolfgang Nejdl

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

6 Citations (Scopus)

Abstract

Distributed Hash Tables (DHTs) are very efficient for querying based on key lookups, if only a small number of keys has to be registered by each individual peer. However, building huge term indexes, as required for IR-style keyword search, are impractical with plain DHTs. Due to the large sizes of document term vocabularies, joining peers cause huge amounts of key inserts, and subsequently large numbers of index maintenance messages. Thus, the key to exploiting DHTs for distributed information retrieval is to reduce index maintenance. We show that this can be achieved by combining DHTs with peer clustering. Peers are first clustered into communities, each of the communities having a representative super-peer. Then all occurrences of a term in a community are published to the global DHT in a batch by the representative super-peer. Our evaluation shows that this reduces index maintenance cost by an order of magnitude, while still keeping a complete and correct term index for query processing.

Original languageEnglish
Title of host publicationProceedings - 21st International Conference on Advanced Information Networking and Applications, AINA 2007
Pages84-93
Number of pages10
DOIs
Publication statusPublished - 25 Sep 2007
Externally publishedYes
Event21st International Conference on Advanced Information Networking and Applications, AINA 2007 - Niagara Falls, ON, Canada
Duration: 21 May 200723 May 2007

Conference

Conference21st International Conference on Advanced Information Networking and Applications, AINA 2007
Country/TerritoryCanada
CityNiagara Falls, ON
Period21/05/0723/05/07

Fingerprint

Dive into the research topics of 'DHTs over peer clusters for distributed information retrieval'. Together they form a unique fingerprint.

Cite this