TY - GEN
T1 - Polytopes, lattices, and spherical codes for the nearest neighbor problem
AU - Laarhoven, Thijs
PY - 2020/6/1
Y1 - 2020/6/1
N2 - We study locality-sensitive hash methods for the nearest neighbor problem for the angular distance, focusing on the approach of first projecting down onto a random low-dimensional subspace, and then partitioning the projected vectors according to the Voronoi cells induced by a well-chosen spherical code. This approach generalizes and interpolates between the fast but asymptotically suboptimal hyperplane hashing of Charikar [STOC 2002], and asymptotically optimal but practically often slower hash families of e.g. Andoni-Indyk [FOCS 2006], Andoni-Indyk-Nguyen-Razenshteyn [SODA 2014] and Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt [NIPS 2015]. We set up a framework for analyzing the performance of any spherical code in this context, and we provide results for various codes appearing in the literature, such as those related to regular polytopes and root lattices. Similar to hyperplane hashing, and unlike e.g. cross-polytope hashing, our analysis of collision probabilities and query exponents is exact and does not hide any order terms which vanish only for large d, thus facilitating an easier parameter selection in practical applications. For the two-dimensional case, we analytically derive closed-form expressions for arbitrary spherical codes, and we show that the equilateral triangle is optimal, achieving a better performance than the two-dimensional analogues of hyperplane and cross-polytope hashing. In three and four dimensions, we numerically find that the tetrahedron and 5-cell (the 3-simplex and 4-simplex) and the 16-cell (the 4-orthoplex) achieve the best query exponents, while in five or more dimensions orthoplices appear to outperform regular simplices, as well as the root lattice families Ak and Dk in terms of minimizing the query exponent. We provide lower bounds based on spherical caps, and we predict that in higher dimensions, larger spherical codes exist which outperform orthoplices in terms of the query exponent, and we argue why using the Dk root lattices will likely lead to better results in practice as well (compared to using cross-polytopes), due to a better trade-off between the asymptotic query exponent and the concrete costs of hashing.
AB - We study locality-sensitive hash methods for the nearest neighbor problem for the angular distance, focusing on the approach of first projecting down onto a random low-dimensional subspace, and then partitioning the projected vectors according to the Voronoi cells induced by a well-chosen spherical code. This approach generalizes and interpolates between the fast but asymptotically suboptimal hyperplane hashing of Charikar [STOC 2002], and asymptotically optimal but practically often slower hash families of e.g. Andoni-Indyk [FOCS 2006], Andoni-Indyk-Nguyen-Razenshteyn [SODA 2014] and Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt [NIPS 2015]. We set up a framework for analyzing the performance of any spherical code in this context, and we provide results for various codes appearing in the literature, such as those related to regular polytopes and root lattices. Similar to hyperplane hashing, and unlike e.g. cross-polytope hashing, our analysis of collision probabilities and query exponents is exact and does not hide any order terms which vanish only for large d, thus facilitating an easier parameter selection in practical applications. For the two-dimensional case, we analytically derive closed-form expressions for arbitrary spherical codes, and we show that the equilateral triangle is optimal, achieving a better performance than the two-dimensional analogues of hyperplane and cross-polytope hashing. In three and four dimensions, we numerically find that the tetrahedron and 5-cell (the 3-simplex and 4-simplex) and the 16-cell (the 4-orthoplex) achieve the best query exponents, while in five or more dimensions orthoplices appear to outperform regular simplices, as well as the root lattice families Ak and Dk in terms of minimizing the query exponent. We provide lower bounds based on spherical caps, and we predict that in higher dimensions, larger spherical codes exist which outperform orthoplices in terms of the query exponent, and we argue why using the Dk root lattices will likely lead to better results in practice as well (compared to using cross-polytopes), due to a better trade-off between the asymptotic query exponent and the concrete costs of hashing.
KW - (approximate) nearest neighbor problem
KW - Lattices
KW - Locality-sensitive hashing (LSH)
KW - Polytopes
KW - Spherical codes
UR - http://www.scopus.com/inward/record.url?scp=85089343659&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.ICALP.2020.76
DO - 10.4230/LIPIcs.ICALP.2020.76
M3 - Conference contribution
AN - SCOPUS:85089343659
T3 - Leibniz International Proceedings in Informatics, LIPIcs
SP - 76:1-76:14
BT - 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020
A2 - Czumaj, Artur
A2 - Dawar, Anuj
A2 - Merelli, Emanuela
PB - Schloss Dagstuhl - Leibniz-Zentrum für Informatik
T2 - 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020
Y2 - 8 July 2020 through 11 July 2020
ER -