A practical algorithm for spatial agglomerative clustering

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

19 Downloads (Pure)

Abstract

We study an agglomerative clustering problem motivated by visualizing disjoint glyphs (represented by geometric shapes) centered at specific locations on a geographic map. As we zoom out, the glyphs grow and start to overlap. We replace overlapping glyphs by one larger merged glyph to maintain disjointness. Our goal is to compute the resulting hierarchical clustering efficiently in practice. A straightforward algorithm for such spatial agglomerative clustering runs in On 2 log n time, where n is the number of glyphs. This is not efficient enough for many real-world datasets which contain up to tens or hundreds of thousands of glyphs. Recently the theoretical upper bound was improved to Onα(n) log 7 n time [10], where α(n) is the extremely slow growing inverse Ackermann function. Although this new algorithm is asymptotically much faster than the naive algorithm, from a practical point of view, it does not perform better for n ≤ 10 6 . In this paper we present a new agglomerative clustering algorithm which works efficiently in practice. Our algorithm relies on the use of quadtrees to speed up spatial computations. Interestingly, even in non-pathological datasets we can encounter large glyphs that intersect many quadtree cells and that are involved in many clustering events. We therefore devise a special strategy to handle such large glyphs. We test our algorithm on several synthetic and real-world datasets and show that it performs well in practice.

Original languageEnglish
Title of host publicationProceedings of the Twenty-First Workshop on Algorithm Engineering and Experiments
Place of PublicationPhiladelphia
PublisherSociety for Industrial and Applied Mathematics (SIAM)
Pages174-185
Number of pages12
ISBN (Electronic)978-1-61197-549-9
DOIs
Publication statusPublished - 2019
Event21st Workshop on Algorithm Engineering and Experiments, ALENEX 2019 - San Diego, United States
Duration: 7 Jan 20198 Jan 2019

Conference

Conference21st Workshop on Algorithm Engineering and Experiments, ALENEX 2019
CountryUnited States
CitySan Diego
Period7/01/198/01/19

Fingerprint

Spatial Clustering
Quadtree
Clustering
Inverse function
Hierarchical Clustering
Intersect
Clustering algorithms
Clustering Algorithm
Overlapping
Overlap
Disjoint
Speedup
Upper bound
Cell

Cite this

Castermans, T., Speckmann, B., & Verbeek, K. (2019). A practical algorithm for spatial agglomerative clustering. In Proceedings of the Twenty-First Workshop on Algorithm Engineering and Experiments (pp. 174-185). Philadelphia: Society for Industrial and Applied Mathematics (SIAM). https://doi.org/10.1137/1.9781611975499.14
Castermans, Thom ; Speckmann, Bettina ; Verbeek, Kevin. / A practical algorithm for spatial agglomerative clustering. Proceedings of the Twenty-First Workshop on Algorithm Engineering and Experiments. Philadelphia : Society for Industrial and Applied Mathematics (SIAM), 2019. pp. 174-185
@inproceedings{9075ba89171f4f31a0f5ea6ed33ca4fa,
title = "A practical algorithm for spatial agglomerative clustering",
abstract = "We study an agglomerative clustering problem motivated by visualizing disjoint glyphs (represented by geometric shapes) centered at specific locations on a geographic map. As we zoom out, the glyphs grow and start to overlap. We replace overlapping glyphs by one larger merged glyph to maintain disjointness. Our goal is to compute the resulting hierarchical clustering efficiently in practice. A straightforward algorithm for such spatial agglomerative clustering runs in On 2 log n time, where n is the number of glyphs. This is not efficient enough for many real-world datasets which contain up to tens or hundreds of thousands of glyphs. Recently the theoretical upper bound was improved to Onα(n) log 7 n time [10], where α(n) is the extremely slow growing inverse Ackermann function. Although this new algorithm is asymptotically much faster than the naive algorithm, from a practical point of view, it does not perform better for n ≤ 10 6 . In this paper we present a new agglomerative clustering algorithm which works efficiently in practice. Our algorithm relies on the use of quadtrees to speed up spatial computations. Interestingly, even in non-pathological datasets we can encounter large glyphs that intersect many quadtree cells and that are involved in many clustering events. We therefore devise a special strategy to handle such large glyphs. We test our algorithm on several synthetic and real-world datasets and show that it performs well in practice.",
author = "Thom Castermans and Bettina Speckmann and Kevin Verbeek",
year = "2019",
doi = "10.1137/1.9781611975499.14",
language = "English",
pages = "174--185",
booktitle = "Proceedings of the Twenty-First Workshop on Algorithm Engineering and Experiments",
publisher = "Society for Industrial and Applied Mathematics (SIAM)",
address = "United States",

}

Castermans, T, Speckmann, B & Verbeek, K 2019, A practical algorithm for spatial agglomerative clustering. in Proceedings of the Twenty-First Workshop on Algorithm Engineering and Experiments. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, pp. 174-185, 21st Workshop on Algorithm Engineering and Experiments, ALENEX 2019, San Diego, United States, 7/01/19. https://doi.org/10.1137/1.9781611975499.14

A practical algorithm for spatial agglomerative clustering. / Castermans, Thom; Speckmann, Bettina; Verbeek, Kevin.

Proceedings of the Twenty-First Workshop on Algorithm Engineering and Experiments. Philadelphia : Society for Industrial and Applied Mathematics (SIAM), 2019. p. 174-185.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - A practical algorithm for spatial agglomerative clustering

AU - Castermans, Thom

AU - Speckmann, Bettina

AU - Verbeek, Kevin

PY - 2019

Y1 - 2019

N2 - We study an agglomerative clustering problem motivated by visualizing disjoint glyphs (represented by geometric shapes) centered at specific locations on a geographic map. As we zoom out, the glyphs grow and start to overlap. We replace overlapping glyphs by one larger merged glyph to maintain disjointness. Our goal is to compute the resulting hierarchical clustering efficiently in practice. A straightforward algorithm for such spatial agglomerative clustering runs in On 2 log n time, where n is the number of glyphs. This is not efficient enough for many real-world datasets which contain up to tens or hundreds of thousands of glyphs. Recently the theoretical upper bound was improved to Onα(n) log 7 n time [10], where α(n) is the extremely slow growing inverse Ackermann function. Although this new algorithm is asymptotically much faster than the naive algorithm, from a practical point of view, it does not perform better for n ≤ 10 6 . In this paper we present a new agglomerative clustering algorithm which works efficiently in practice. Our algorithm relies on the use of quadtrees to speed up spatial computations. Interestingly, even in non-pathological datasets we can encounter large glyphs that intersect many quadtree cells and that are involved in many clustering events. We therefore devise a special strategy to handle such large glyphs. We test our algorithm on several synthetic and real-world datasets and show that it performs well in practice.

AB - We study an agglomerative clustering problem motivated by visualizing disjoint glyphs (represented by geometric shapes) centered at specific locations on a geographic map. As we zoom out, the glyphs grow and start to overlap. We replace overlapping glyphs by one larger merged glyph to maintain disjointness. Our goal is to compute the resulting hierarchical clustering efficiently in practice. A straightforward algorithm for such spatial agglomerative clustering runs in On 2 log n time, where n is the number of glyphs. This is not efficient enough for many real-world datasets which contain up to tens or hundreds of thousands of glyphs. Recently the theoretical upper bound was improved to Onα(n) log 7 n time [10], where α(n) is the extremely slow growing inverse Ackermann function. Although this new algorithm is asymptotically much faster than the naive algorithm, from a practical point of view, it does not perform better for n ≤ 10 6 . In this paper we present a new agglomerative clustering algorithm which works efficiently in practice. Our algorithm relies on the use of quadtrees to speed up spatial computations. Interestingly, even in non-pathological datasets we can encounter large glyphs that intersect many quadtree cells and that are involved in many clustering events. We therefore devise a special strategy to handle such large glyphs. We test our algorithm on several synthetic and real-world datasets and show that it performs well in practice.

UR - http://www.scopus.com/inward/record.url?scp=85065203522&partnerID=8YFLogxK

U2 - 10.1137/1.9781611975499.14

DO - 10.1137/1.9781611975499.14

M3 - Conference contribution

AN - SCOPUS:85065203522

SP - 174

EP - 185

BT - Proceedings of the Twenty-First Workshop on Algorithm Engineering and Experiments

PB - Society for Industrial and Applied Mathematics (SIAM)

CY - Philadelphia

ER -

Castermans T, Speckmann B, Verbeek K. A practical algorithm for spatial agglomerative clustering. In Proceedings of the Twenty-First Workshop on Algorithm Engineering and Experiments. Philadelphia: Society for Industrial and Applied Mathematics (SIAM). 2019. p. 174-185 https://doi.org/10.1137/1.9781611975499.14