Text clustering for peer-to-peer networks with probabilistic guarantees

Odysseas Papapetrou, Wolf Siberski, Norbert Fuhr

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

3 Citations (Scopus)

Abstract

Text clustering is an established technique for improving quality in information retrieval, for both centralized and distributed environments. However, for highly distributed environments, such as peer-to-peer networks, current clustering algorithms fail to scale. Our algorithm for peer-to-peer clustering achieves high scalability by using a probabilistic approach for assigning documents to clusters. It enables a peer to compare each of its documents only with very few selected clusters, without significant loss of clustering quality. The algorithm offers probabilistic guarantees for the correctness of each document assignment to a cluster. Extensive experimental evaluation with up to 100000 peers and 1 million documents demonstrates the scalability and effectiveness of the algorithm.

Original languageEnglish
Title of host publicationAdvances in Information Retrieval - 32nd European Conference on IR Research, ECIR 2010, Proceedings
Pages293-305
Number of pages13
DOIs
Publication statusPublished - 20 May 2010
Externally publishedYes
Event32nd European Conference on Information Retrieval, ECIR 2010 - Milton Keynes, United Kingdom
Duration: 28 Mar 201031 Mar 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5993 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference32nd European Conference on Information Retrieval, ECIR 2010
CountryUnited Kingdom
CityMilton Keynes
Period28/03/1031/03/10

Fingerprint Dive into the research topics of 'Text clustering for peer-to-peer networks with probabilistic guarantees'. Together they form a unique fingerprint.

Cite this