k is the magic number: inferring the number of clusters through nonparametric concentration inequalities

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

3 Citaten (Scopus)
3 Downloads (Pure)

Samenvatting

Most convex and nonconvex clustering algorithms come with one crucial parameter: the k in k-means. To this day, there is not one generally accepted way to accurately determine this parameter. Popular methods are simple yet theoretically unfounded, such as searching for an elbow in the curve of a given cost-measure. In contrast, statistically founded methods often make strict assumptions over the data distribution or come with their own optimization scheme for the clustering objective. This limits either the set of applicable datasets or clustering algorithms. In this paper, we strive to determine the number of clusters by answering a simple question: given two clusters, is it likely that they jointly stem from a single distribution? To this end, we propose a bound on the probability that two clusters originate from the distribution of the unified cluster, specified only by the sample mean and variance. Our method is applicable as a simple wrapper to the result of any clustering method minimizing the objective of k-means, which includes Gaussian mixtures and Spectral Clustering. We focus in our experimental evaluation on an application for nonconvex clustering and demonstrate the suitability of our theoretical results.
Originele taal-2Engels
TitelMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2019, Proceedings
SubtitelEuropean Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings, Part I
RedacteurenUlf Brefeld, Elisa Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis, Céline Robardet
Plaats van productieCham
UitgeverijSpringer Nature
Pagina's257-273
Aantal pagina's17
ISBN van elektronische versie978-3-030-46150-8
ISBN van geprinte versie978-3-030-46149-2
DOI's
StatusGepubliceerd - 2020
Evenement2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019) - Wurzburg, Duitsland
Duur: 16 sep. 201920 sep. 2019
Congresnummer: 19
http://ecmlpkdd2019.org/

Publicatie series

NaamLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11906 LNAI
ISSN van geprinte versie0302-9743
ISSN van elektronische versie1611-3349

Congres

Congres2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019)
Verkorte titelECML PKDD 2019
Land/RegioDuitsland
StadWurzburg
Periode16/09/1920/09/19
Internet adres

Vingerafdruk

Duik in de onderzoeksthema's van 'k is the magic number: inferring the number of clusters through nonparametric concentration inequalities'. Samen vormen ze een unieke vingerafdruk.

Citeer dit