TY - GEN
T1 - Text clustering for peer-to-peer networks with probabilistic guarantees
AU - Papapetrou, Odysseas
AU - Siberski, Wolf
AU - Fuhr, Norbert
PY - 2010/5/20
Y1 - 2010/5/20
N2 - Text clustering is an established technique for improving quality in information retrieval, for both centralized and distributed environments. However, for highly distributed environments, such as peer-to-peer networks, current clustering algorithms fail to scale. Our algorithm for peer-to-peer clustering achieves high scalability by using a probabilistic approach for assigning documents to clusters. It enables a peer to compare each of its documents only with very few selected clusters, without significant loss of clustering quality. The algorithm offers probabilistic guarantees for the correctness of each document assignment to a cluster. Extensive experimental evaluation with up to 100000 peers and 1 million documents demonstrates the scalability and effectiveness of the algorithm.
AB - Text clustering is an established technique for improving quality in information retrieval, for both centralized and distributed environments. However, for highly distributed environments, such as peer-to-peer networks, current clustering algorithms fail to scale. Our algorithm for peer-to-peer clustering achieves high scalability by using a probabilistic approach for assigning documents to clusters. It enables a peer to compare each of its documents only with very few selected clusters, without significant loss of clustering quality. The algorithm offers probabilistic guarantees for the correctness of each document assignment to a cluster. Extensive experimental evaluation with up to 100000 peers and 1 million documents demonstrates the scalability and effectiveness of the algorithm.
UR - http://www.scopus.com/inward/record.url?scp=77952311353&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-12275-0-27
DO - 10.1007/978-3-642-12275-0-27
M3 - Conference contribution
AN - SCOPUS:77952311353
SN - 3642122744
SN - 9783642122743
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 293
EP - 305
BT - Advances in Information Retrieval - 32nd European Conference on IR Research, ECIR 2010, Proceedings
T2 - 32nd European Conference on Information Retrieval, ECIR 2010
Y2 - 28 March 2010 through 31 March 2010
ER -