## Samenvatting

We study locality-sensitive hash methods for the nearest neighbor problem for the angular distance, focusing on the approach of first projecting down onto a random low-dimensional subspace, and then partitioning the projected vectors according to the Voronoi cells induced by a well-chosen spherical code. This approach generalizes and interpolates between the fast but asymptotically suboptimal hyperplane hashing of Charikar [STOC 2002], and asymptotically optimal but practically often slower hash families of e.g. Andoni-Indyk [FOCS 2006], Andoni-Indyk-Nguyen-Razenshteyn [SODA 2014] and Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt [NIPS 2015]. We set up a framework for analyzing the performance of any spherical code in this context, and we provide results for various codes appearing in the literature, such as those related to regular polytopes and root lattices. Similar to hyperplane hashing, and unlike e.g. cross-polytope hashing, our analysis of collision probabilities and query exponents is exact and does not hide any order terms which vanish only for large d, thus facilitating an easier parameter selection in practical applications. For the two-dimensional case, we analytically derive closed-form expressions for arbitrary spherical codes, and we show that the equilateral triangle is optimal, achieving a better performance than the two-dimensional analogues of hyperplane and cross-polytope hashing. In three and four dimensions, we numerically find that the tetrahedron and 5-cell (the 3-simplex and 4-simplex) and the 16-cell (the 4-orthoplex) achieve the best query exponents, while in five or more dimensions orthoplices appear to outperform regular simplices, as well as the root lattice families A_{k} and D_{k} in terms of minimizing the query exponent. We provide lower bounds based on spherical caps, and we predict that in higher dimensions, larger spherical codes exist which outperform orthoplices in terms of the query exponent, and we argue why using the D_{k} root lattices will likely lead to better results in practice as well (compared to using cross-polytopes), due to a better trade-off between the asymptotic query exponent and the concrete costs of hashing.

Originele taal-2 | Engels |
---|---|

Titel | 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020 |

Redacteuren | Artur Czumaj, Anuj Dawar, Emanuela Merelli |

Uitgeverij | Schloss Dagstuhl - Leibniz-Zentrum für Informatik |

ISBN van elektronische versie | 9783959771382 |

DOI's | |

Status | Gepubliceerd - 1 jun 2020 |

Evenement | 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020 - Virtual, Online, Duitsland Duur: 8 jul 2020 → 11 jul 2020 |

### Publicatie series

Naam | Leibniz International Proceedings in Informatics, LIPIcs |
---|---|

Volume | 168 |

ISSN van geprinte versie | 1868-8969 |

### Congres

Congres | 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020 |
---|---|

Land | Duitsland |

Stad | Virtual, Online |

Periode | 8/07/20 → 11/07/20 |