TY - GEN
T1 - Investigations on path indexing for graph databases
AU - Sumrall, Jonathan M.
AU - Fletcher, George H.L.
AU - Poulovassilis, Alexandra
AU - Svensson, Johan
AU - Vejlstrup, Magnus
AU - Vest, Chris
AU - Webber, Jim
PY - 2017/1/1
Y1 - 2017/1/1
N2 - Graph databases have become an increasingly popular choice for the management of the massive network data sets arising in many contemporary applications. We investigate the effectiveness of path indexing for accelerating query processing in graph database systems, using as an exemplar the widely used open-source Neo4j graph database. We present a novel path index design which supports efficient ordered access to paths in a graph dataset. Our index is fully persistent and designed for external memory storage and retrieval. We also describe a compression scheme that exploits the limited differences between consecutive keys in the index, as well as a workload-driven approach to indexing. We demonstrate empirically the speed-ups achieved by our implementation, showing that the path index yields query run-times from 2x up to 8000x faster than Neo4j. Empirical evaluation also shows that our scheme leads to smaller indexes than using general-purpose LZ4 compression. The complete stand-alone implementation of our index, as well as supporting tooling such as a bulk-loader, are provided as open source for further research and development.
AB - Graph databases have become an increasingly popular choice for the management of the massive network data sets arising in many contemporary applications. We investigate the effectiveness of path indexing for accelerating query processing in graph database systems, using as an exemplar the widely used open-source Neo4j graph database. We present a novel path index design which supports efficient ordered access to paths in a graph dataset. Our index is fully persistent and designed for external memory storage and retrieval. We also describe a compression scheme that exploits the limited differences between consecutive keys in the index, as well as a workload-driven approach to indexing. We demonstrate empirically the speed-ups achieved by our implementation, showing that the path index yields query run-times from 2x up to 8000x faster than Neo4j. Empirical evaluation also shows that our scheme leads to smaller indexes than using general-purpose LZ4 compression. The complete stand-alone implementation of our index, as well as supporting tooling such as a bulk-loader, are provided as open source for further research and development.
UR - http://www.scopus.com/inward/record.url?scp=85020464825&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-58943-5_43
DO - 10.1007/978-3-319-58943-5_43
M3 - Conference contribution
AN - SCOPUS:85020464825
SN - 978-3-319-58942-8
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 532
EP - 544
BT - Euro-Par 2016
A2 - Dutot, Pierre-Francois
A2 - Desprez, Frederic
PB - Springer
CY - Cham
T2 - 22nd International Conference on Parallel and Distributed Computing, Euro-Par 2016
Y2 - 24 August 2016 through 26 August 2016
ER -