Abstract
Let P be a set of n colored points in R d. We develop efficient data structures that store P and can answer chromatic k-nearest neighbor (k-NN) queries. Such a query consists of a query point q and a number k, and asks for the color that appears most frequently among the k points in P closest to q. Answering such queries efficiently is the key to obtain fast k-NN classifiers. Our main aim is to obtain query times that are independent of k while using near-linear space. We show that this is possible using a combination of two data structures. The first data structure allow us to compute a region containing exactly the k-nearest neighbors of a query point q, and the second data structure can then report the most frequent color in such a region. This leads to linear-space data structures with query times of (Formula Presented) for points in R 1, and with query times varying between (Formula Presented) and O(n 5/6 polylog n), depending on the distance measure used, for points in R 2. Since these query times are still fairly large we also consider approximations. If we are allowed to report a color that appears at least (1 − ε)f *times, where f ∗ is the frequency of the most frequent color, we obtain a query time of (Formula Presented) and expected query times ranging between (Formula Presented) and (Formula Presented) in R 2 using near-linear space (ignoring polylogarithmic factors). All of our data structures are for the pointer-machine model.
Original language | English |
---|---|
Pages (from-to) | 65-107 |
Number of pages | 43 |
Journal | Journal of Computational Geometry |
Volume | 16 |
Issue number | 1 |
DOIs | |
Publication status | Published - 19 Feb 2025 |
Funding
Maarten Löffler was partially supported by the Dutch Research Council (NWO) under the project numbers 614.001.504 and 628.011.005.
Funders | Funder number |
---|---|
Nederlandse Organisatie voor Wetenschappelijk Onderzoek | 628.011.005, 614.001.504 |