Polytopes, lattices, and spherical codes for the nearest neighbor problem

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

We study locality-sensitive hash methods for the nearest neighbor problem for the angular distance, focusing on the approach of first projecting down onto a random low-dimensional subspace, and then partitioning the projected vectors according to the Voronoi cells induced by a well-chosen spherical code. This approach generalizes and interpolates between the fast but asymptotically suboptimal hyperplane hashing of Charikar [STOC 2002], and asymptotically optimal but practically often slower hash families of e.g. Andoni-Indyk [FOCS 2006], Andoni-Indyk-Nguyen-Razenshteyn [SODA 2014] and Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt [NIPS 2015]. We set up a framework for analyzing the performance of any spherical code in this context, and we provide results for various codes appearing in the literature, such as those related to regular polytopes and root lattices. Similar to hyperplane hashing, and unlike e.g. cross-polytope hashing, our analysis of collision probabilities and query exponents is exact and does not hide any order terms which vanish only for large d, thus facilitating an easier parameter selection in practical applications. For the two-dimensional case, we analytically derive closed-form expressions for arbitrary spherical codes, and we show that the equilateral triangle is optimal, achieving a better performance than the two-dimensional analogues of hyperplane and cross-polytope hashing. In three and four dimensions, we numerically find that the tetrahedron and 5-cell (the 3-simplex and 4-simplex) and the 16-cell (the 4-orthoplex) achieve the best query exponents, while in five or more dimensions orthoplices appear to outperform regular simplices, as well as the root lattice families Ak and Dk in terms of minimizing the query exponent. We provide lower bounds based on spherical caps, and we predict that in higher dimensions, larger spherical codes exist which outperform orthoplices in terms of the query exponent, and we argue why using the Dk root lattices will likely lead to better results in practice as well (compared to using cross-polytopes), due to a better trade-off between the asymptotic query exponent and the concrete costs of hashing.

Original languageEnglish
Title of host publication47th International Colloquium on Automata, Languages, and Programming, ICALP 2020
EditorsArtur Czumaj, Anuj Dawar, Emanuela Merelli
PublisherSchloss Dagstuhl - Leibniz-Zentrum für Informatik
Pages76:1-76:14
Number of pages14
ISBN (Electronic)9783959771382
DOIs
Publication statusPublished - 1 Jun 2020
Event47th International Colloquium on Automata, Languages, and Programming, ICALP 2020 - Virtual, Online, Germany
Duration: 8 Jul 202011 Jul 2020

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume168
ISSN (Print)1868-8969

Conference

Conference47th International Colloquium on Automata, Languages, and Programming, ICALP 2020
Country/TerritoryGermany
CityVirtual, Online
Period8/07/2011/07/20

Funding

Funding The author is supported by an NWO Veni grant with project number 016.Veni.192.005. Part of this work was done while the author was visiting the Simons Institute for the Theory of Computing at the University of California, Berkeley.

FundersFunder number
University of California at Berkeley
Nederlandse Organisatie voor Wetenschappelijk Onderzoek016

    Keywords

    • (approximate) nearest neighbor problem
    • Lattices
    • Locality-sensitive hashing (LSH)
    • Polytopes
    • Spherical codes

    Fingerprint

    Dive into the research topics of 'Polytopes, lattices, and spherical codes for the nearest neighbor problem'. Together they form a unique fingerprint.

    Cite this