Samenvatting
Most convex and nonconvex clustering algorithms come with one crucial parameter: the k in k-means. To this day, there is not one generally accepted way to accurately determine this parameter. Popular methods are simple yet theoretically unfounded, such as searching for an elbow in the curve of a given cost-measure. In contrast, statistically founded methods often make strict assumptions over the data distribution or come with their own optimization scheme for the clustering objective. This limits either the set of applicable datasets or clustering algorithms. In this paper, we strive to determine the number of clusters by answering a simple question: given two clusters, is it likely that they jointly stem from a single distribution? To this end, we propose a bound on the probability that two clusters originate from the distribution of the unified cluster, specified only by the sample mean and variance. Our method is applicable as a simple wrapper to the result of any clustering method minimizing the objective of k-means, which includes Gaussian mixtures and Spectral Clustering. We focus in our experimental evaluation on an application for nonconvex clustering and demonstrate the suitability of our theoretical results.
Originele taal-2 | Engels |
---|---|
Titel | Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2019, Proceedings |
Subtitel | European Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings, Part I |
Redacteuren | Ulf Brefeld, Elisa Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis, Céline Robardet |
Plaats van productie | Cham |
Uitgeverij | Springer Nature |
Pagina's | 257-273 |
Aantal pagina's | 17 |
ISBN van elektronische versie | 978-3-030-46150-8 |
ISBN van geprinte versie | 978-3-030-46149-2 |
DOI's | |
Status | Gepubliceerd - 2020 |
Evenement | 2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019) - Wurzburg, Duitsland Duur: 16 sep. 2019 → 20 sep. 2019 Congresnummer: 19 http://ecmlpkdd2019.org/ |
Publicatie series
Naam | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 11906 LNAI |
ISSN van geprinte versie | 0302-9743 |
ISSN van elektronische versie | 1611-3349 |
Congres
Congres | 2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019) |
---|---|
Verkorte titel | ECML PKDD 2019 |
Land/Regio | Duitsland |
Stad | Wurzburg |
Periode | 16/09/19 → 20/09/19 |
Internet adres |