Using internal evaluation measures to validate the quality of diverse stream clustering algorithms

M. Hassani, T. Seidl

Research output: Contribution to journalArticleAcademicpeer-review

183 Downloads (Pure)


Measuring the quality of a clustering algorithm has shown to be as important as the algorithm itself. It is a crucial part of choosing the clustering algorithm that performs best for an input data. Streaming input data have many features that make them much more challenging than static ones. They are endless, varying and emerging with high speeds. This raised new challenges for the clustering algorithms as well as for their evaluation measures. Up till now, external evaluation measures were exclusively used for validating stream clustering algorithms. While external validation requires a ground truth which is not provided in most applications, particularly in the streaming case, internal clustering validation is efficient and realistic. In this article, we analyze the properties and performances of eleven internal clustering measures. In particular, we apply these measures to carefully synthesized stream scenarios to reveal how they react to clusterings on evolving data streams using both k-means-based and density-based clustering algorithms. A series of experimental results show that different from the case with static data, the Calinski-Harabasz index performs the best in coping with common aspects and errors of stream clustering for k-means-based algorithms, while the revised validity index performs the best for density-based ones.
Original languageEnglish
Pages (from-to)171–183
Number of pages13
JournalVietnam Journal of Computer Science
Issue number3
Publication statusPublished - 1 Aug 2017


  • Stream clustering Internal evaluation measures Clustering Validation MOA


Dive into the research topics of 'Using internal evaluation measures to validate the quality of diverse stream clustering algorithms'. Together they form a unique fingerprint.

Cite this