Using internal evaluation measures to validate the quality of diverse stream clustering algorithms

M. Hassani, T. Seidl

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

175 Downloads (Pure)

Samenvatting

Measuring the quality of a clustering algorithm has shown to be as important as the algorithm itself. It is a crucial part of choosing the clustering algorithm that performs best for an input data. Streaming input data have many features that make them much more challenging than static ones. They are endless, varying and emerging with high speeds. This raised new challenges for the clustering algorithms as well as for their evaluation measures. Up till now, external evaluation measures were exclusively used for validating stream clustering algorithms. While external validation requires a ground truth which is not provided in most applications, particularly in the streaming case, internal clustering validation is efficient and realistic. In this article, we analyze the properties and performances of eleven internal clustering measures. In particular, we apply these measures to carefully synthesized stream scenarios to reveal how they react to clusterings on evolving data streams using both k-means-based and density-based clustering algorithms. A series of experimental results show that different from the case with static data, the Calinski-Harabasz index performs the best in coping with common aspects and errors of stream clustering for k-means-based algorithms, while the revised validity index performs the best for density-based ones.
Originele taal-2Engels
Pagina's (van-tot)171–183
Aantal pagina's13
TijdschriftVietnam Journal of Computer Science
Volume4
Nummer van het tijdschrift3
DOI's
StatusGepubliceerd - 1 aug. 2017

Vingerafdruk

Duik in de onderzoeksthema's van 'Using internal evaluation measures to validate the quality of diverse stream clustering algorithms'. Samen vormen ze een unieke vingerafdruk.

Citeer dit