TY - GEN
T1 - HGraph
T2 - DEXA 2019 - 30th International Conference on Database and Expert Systems Applications
AU - Capobianco Shimomura, Larissa
AU - Kaster, Daniel
PY - 2019
Y1 - 2019
N2 - Similarity search is a common approach to support new applications that deal with complex data (e.g., images, videos, georeferenced data, etc.). As a consequence, appropriate indexing structures to support this task have been proposed in the literature. Recently, graph-based methods have shown to be very efficient for approximate similarity search. However, some of the main types of graphs used still suffer from two main drawbacks: (i) slow construction, and (ii) inaccurate retrieval. To reduce these drawbacks, in this paper, we propose the HGraph method. HGraph is a divide-and-conquer method for building graphs for similarity search that recursively partitions the input dataset and connect vertices across partitions at different levels. The method can be used with different types of graphs proposed in the literature to speed up the graph construction time as well as to increase the approximate search results quality through long-range edges connecting pivots of different partitions. We present experimental results using real datasets that show that HGraph applied to the k-NNG graph was able to decrease the construction time while increasing the approximate search recall when compared to the k-NNG. Regarding the application of HGraph to the NSW graph, the query recall also increased, however with a higher computational cost. An analysis of different combinations of the tested methods demonstrated HGraph query times given a recall rate were always among the top results regarding different setups.
AB - Similarity search is a common approach to support new applications that deal with complex data (e.g., images, videos, georeferenced data, etc.). As a consequence, appropriate indexing structures to support this task have been proposed in the literature. Recently, graph-based methods have shown to be very efficient for approximate similarity search. However, some of the main types of graphs used still suffer from two main drawbacks: (i) slow construction, and (ii) inaccurate retrieval. To reduce these drawbacks, in this paper, we propose the HGraph method. HGraph is a divide-and-conquer method for building graphs for similarity search that recursively partitions the input dataset and connect vertices across partitions at different levels. The method can be used with different types of graphs proposed in the literature to speed up the graph construction time as well as to increase the approximate search results quality through long-range edges connecting pivots of different partitions. We present experimental results using real datasets that show that HGraph applied to the k-NNG graph was able to decrease the construction time while increasing the approximate search recall when compared to the k-NNG. Regarding the application of HGraph to the NSW graph, the query recall also increased, however with a higher computational cost. An analysis of different combinations of the tested methods demonstrated HGraph query times given a recall rate were always among the top results regarding different setups.
KW - Metric spaces
KW - Proximity Graphs
KW - Similarity search
UR - http://www.scopus.com/inward/record.url?scp=85077136689&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-27615-7_8
DO - 10.1007/978-3-030-27615-7_8
M3 - Conference contribution
SN - 978-3-030-27614-0
VL - 11706
T3 - Lecture Notes in Computer Science
SP - 106
EP - 121
BT - Database and Expert Systems Applications - 30th International Conference, DEXA 2019, Proceedings
A2 - Hartmann, Sven
A2 - Küng, Josef
A2 - Anderst-Kotsis, Gabriele
A2 - Khalil, Ismail
A2 - Chakravarthy, Sharma
A2 - Tjoa, A Min
PB - Springer
CY - Cham
Y2 - 26 August 2019 through 29 August 2019
ER -