Evaluation of the Sample Clustering Process on Graphs

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

6 Citaten (Scopus)
118 Downloads (Pure)

Samenvatting

An increasing number of networks are becoming large-scale and continuously growing in nature, such that clustering on them in their entirety could be intractable. A feasible way to overcome this problem is to sample a representative subgraph and exploit its clustering structure (namely, sample clustering process). However, there are two issues that we should address in current studies. One underlying question is how to evaluate the clustering quality of the entire sample clustering process. Another non-trivial issue is that multiple ground-truths exist in networks, thus evaluating the clustering results in such scenario is also a challenging task. In this paper, first we utilize the set-matching methodology to quantitatively evaluate how differently the clusters of the sampled counterpart correspond to the ground-truth(s) in the original graph, and propose several new quality metrics to capture the differences of clustering structure in various aspects. Second, we put forward an evaluation framework for the general problems of evaluating the clustering quality on graph samples. Extensive experiments on various synthetic and real-world graphs demonstrate that our new quality metrics are more accurate and insightful for the sample clustering evaluation than conventional metrics (e.g., NMI). Thus the evaluation framework is effective and practical to assess the clustering quality of the sample clustering process on massive graphs.

Originele taal-2Engels
Artikelnummer8666073
Pagina's (van-tot)1333-1347
Aantal pagina's15
TijdschriftIEEE Transactions on Knowledge and Data Engineering
Volume32
Nummer van het tijdschrift7
DOI's
StatusGepubliceerd - 1 jul. 2020

Financiering

This research of Dr. Zhang is supported by the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (Grant No. 61521003) and the National Key Research and Development Program of China (Grant No. 2016YFB0800101). The authors would like to thank Kaijie Zhu, Wilco van Leeuwen, Anil Yaman, and Fucai Chen for a careful reading of the manuscript and many valuable comments.

FinanciersFinanciernummer
National Key Research and Development Program of China2016YFB0800101
National Natural Science Foundation of China61521003

    Vingerafdruk

    Duik in de onderzoeksthema's van 'Evaluation of the Sample Clustering Process on Graphs'. Samen vormen ze een unieke vingerafdruk.

    Citeer dit