TY - GEN
T1 - XStreamCluster
T2 - 16th International Conference on Database Systems for Advanced Applications, DASFAA 2011
AU - Papapetrou, Odysseas
AU - Chen, Ling
PY - 2011/4/28
Y1 - 2011/4/28
N2 - XML clustering finds many applications, ranging from storage to query processing. However, existing clustering algorithms focus on static XML collections, whereas modern information systems frequently deal with streaming XML data that needs to be processed online. Streaming XML clustering is a challenging task because of the high computational and space efficiency requirements implicated for online approaches. In this paper we propose XStreamCluster, which addresses the two challenges using a two-layered optimization. The bottom layer employs Bloom filters to encode the XML documents, providing a space-efficient solution to memory usage. The top layer is based on Locality Sensitive Hashing and contributes to the computational efficiency. The theoretical analysis shows that the approximate solution of XStreamCluster generates similarly good clusters as the exact solution, with high probability. The experimental results demonstrate that XStreamCluster improves both memory efficiency and computational time by at least an order of magnitude without affecting clustering quality, compared to its variants and a baseline approach.
AB - XML clustering finds many applications, ranging from storage to query processing. However, existing clustering algorithms focus on static XML collections, whereas modern information systems frequently deal with streaming XML data that needs to be processed online. Streaming XML clustering is a challenging task because of the high computational and space efficiency requirements implicated for online approaches. In this paper we propose XStreamCluster, which addresses the two challenges using a two-layered optimization. The bottom layer employs Bloom filters to encode the XML documents, providing a space-efficient solution to memory usage. The top layer is based on Locality Sensitive Hashing and contributes to the computational efficiency. The theoretical analysis shows that the approximate solution of XStreamCluster generates similarly good clusters as the exact solution, with high probability. The experimental results demonstrate that XStreamCluster improves both memory efficiency and computational time by at least an order of magnitude without affecting clustering quality, compared to its variants and a baseline approach.
UR - http://www.scopus.com/inward/record.url?scp=79955114145&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-20149-3_36
DO - 10.1007/978-3-642-20149-3_36
M3 - Conference contribution
AN - SCOPUS:79955114145
SN - 9783642201486
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 496
EP - 510
BT - Database Systems for Advanced Applications - 16th International Conference, DASFAA 2011, Proceedings
Y2 - 22 April 2011 through 25 April 2011
ER -