TY - JOUR

T1 - Optimal hashing-based time-space trade-offs for approximate near neighbors

AU - Andoni, Alexandr

AU - Laarhoven, Thijs

AU - Razenshteyn, Ilya

AU - Waingarten, Erik

N1 - 62 pages, 5 figures; a merger of arXiv:1511.07527 [cs.DS] and arXiv:1605.02701 [cs.DS], which subsumes both of the preprints. New version contains more elaborated proofs and fixed some typos

PY - 2016/8/11

Y1 - 2016/8/11

N2 - [See the paper for the full abstract.] We show tight upper and lower bounds for time-space trade-offs for the $c$-Approximate Near Neighbor Search problem. For the $d$-dimensional Euclidean space and $n$-point datasets, we develop a data structure with space $n^{1 + \rho_u + o(1)} + O(dn)$ and query time $n^{\rho_q + o(1)} + d n^{o(1)}$ for every $\rho_u, \rho_q \geq 0$ such that: \begin{equation} c^2 \sqrt{\rho_q} + (c^2 - 1) \sqrt{\rho_u} = \sqrt{2c^2 - 1}. \end{equation} This is the first data structure that achieves sublinear query time and near-linear space for every approximation factor $c > 1$, improving upon [Kapralov, PODS 2015]. The data structure is a culmination of a long line of work on the problem for all space regimes; it builds on Spherical Locality-Sensitive Filtering [Becker, Ducas, Gama, Laarhoven, SODA 2016] and data-dependent hashing [Andoni, Indyk, Nguyen, Razenshteyn, SODA 2014] [Andoni, Razenshteyn, STOC 2015]. Our matching lower bounds are of two types: conditional and unconditional. First, we prove tightness of the whole above trade-off in a restricted model of computation, which captures all known hashing-based approaches. We then show unconditional cell-probe lower bounds for one and two probes that match the above trade-off for $\rho_q = 0$, improving upon the best known lower bounds from [Panigrahy, Talwar, Wieder, FOCS 2010]. In particular, this is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than the one-probe bound. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.

AB - [See the paper for the full abstract.] We show tight upper and lower bounds for time-space trade-offs for the $c$-Approximate Near Neighbor Search problem. For the $d$-dimensional Euclidean space and $n$-point datasets, we develop a data structure with space $n^{1 + \rho_u + o(1)} + O(dn)$ and query time $n^{\rho_q + o(1)} + d n^{o(1)}$ for every $\rho_u, \rho_q \geq 0$ such that: \begin{equation} c^2 \sqrt{\rho_q} + (c^2 - 1) \sqrt{\rho_u} = \sqrt{2c^2 - 1}. \end{equation} This is the first data structure that achieves sublinear query time and near-linear space for every approximation factor $c > 1$, improving upon [Kapralov, PODS 2015]. The data structure is a culmination of a long line of work on the problem for all space regimes; it builds on Spherical Locality-Sensitive Filtering [Becker, Ducas, Gama, Laarhoven, SODA 2016] and data-dependent hashing [Andoni, Indyk, Nguyen, Razenshteyn, SODA 2014] [Andoni, Razenshteyn, STOC 2015]. Our matching lower bounds are of two types: conditional and unconditional. First, we prove tightness of the whole above trade-off in a restricted model of computation, which captures all known hashing-based approaches. We then show unconditional cell-probe lower bounds for one and two probes that match the above trade-off for $\rho_q = 0$, improving upon the best known lower bounds from [Panigrahy, Talwar, Wieder, FOCS 2010]. In particular, this is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than the one-probe bound. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.

KW - cs.DS

KW - cs.CC

KW - cs.CG

KW - cs.IR

M3 - Article

JO - arXiv.org,e-Print Archive, Mathematics

JF - arXiv.org,e-Print Archive, Mathematics

ER -