Efficient external-memory bisimulation on DAGs

J. Hellings, G.H.L. Fletcher, H.J. Haverkort

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

15 Citations (Scopus)

Abstract

In this paper we introduce the first efficient external-memory algorithm to compute the bisimilarity equivalence classes of a directed acyclic graph (DAG). DAGs are commonly used to model data in a wide variety of practical applications, ranging from XML documents and data provenance models, to web taxonomies and scientific work flows. In the study of efficient reasoning over massive graphs, the notion of node bisimilarity plays a central role. For example, grouping together bisimilar nodes in an XML data set is the first step in many sophisticated approaches to building indexing data structures for efficient XPath query evaluation. To date, however, only internal-memory bisimulation algorithms have been investigated. As the size of real-world DAG data sets often exceeds available main memory, storage in external memory becomes necessary. Hence, there is a practical need for an efficient approach to computing bisimulation in external memory. Our general algorithm has a worst-case IO-complexity of O(Sort(|N| + |E|)), where |N| and |E| are the numbers of nodes and edges, resp., in the data graph and Sort(n) is the number of accesses to external memory needed to sort an input of size n. We also study specializations of this algorithm to common variations of bisimulation for tree-structured XML data sets. We empirically verify efficient performance of the algorithms on graphs and XML documents having billions of nodes and edges, and find that the algorithms can process such graphs efficiently even when very limited internal memory is available. The proposed algorithms are simple enough for practical implementation and use, and open the door for further study of external-memory bisimulation algorithms. To this end, the full open-source C++ implementation has been made freely available.
Original languageEnglish
Title of host publicationProceedings of the 31st ACM SIGMOD International Conference on Management of Data (Scottsdale AZ, USA, May 20-24, 2012)
Place of PublicationNew York NY
PublisherAssociation for Computing Machinery, Inc
Pages553-564
ISBN (Print)978-1-4503-1247-9
DOIs
Publication statusPublished - 2012

Fingerprint

Data storage equipment
XML
Equivalence classes
Taxonomies
World Wide Web
Data structures

Cite this

Hellings, J., Fletcher, G. H. L., & Haverkort, H. J. (2012). Efficient external-memory bisimulation on DAGs. In Proceedings of the 31st ACM SIGMOD International Conference on Management of Data (Scottsdale AZ, USA, May 20-24, 2012) (pp. 553-564). New York NY: Association for Computing Machinery, Inc. https://doi.org/10.1145/2213836.2213899
Hellings, J. ; Fletcher, G.H.L. ; Haverkort, H.J. / Efficient external-memory bisimulation on DAGs. Proceedings of the 31st ACM SIGMOD International Conference on Management of Data (Scottsdale AZ, USA, May 20-24, 2012). New York NY : Association for Computing Machinery, Inc, 2012. pp. 553-564
@inproceedings{7aacf9fed01a4ecaa9859cbf2e4441f0,
title = "Efficient external-memory bisimulation on DAGs",
abstract = "In this paper we introduce the first efficient external-memory algorithm to compute the bisimilarity equivalence classes of a directed acyclic graph (DAG). DAGs are commonly used to model data in a wide variety of practical applications, ranging from XML documents and data provenance models, to web taxonomies and scientific work flows. In the study of efficient reasoning over massive graphs, the notion of node bisimilarity plays a central role. For example, grouping together bisimilar nodes in an XML data set is the first step in many sophisticated approaches to building indexing data structures for efficient XPath query evaluation. To date, however, only internal-memory bisimulation algorithms have been investigated. As the size of real-world DAG data sets often exceeds available main memory, storage in external memory becomes necessary. Hence, there is a practical need for an efficient approach to computing bisimulation in external memory. Our general algorithm has a worst-case IO-complexity of O(Sort(|N| + |E|)), where |N| and |E| are the numbers of nodes and edges, resp., in the data graph and Sort(n) is the number of accesses to external memory needed to sort an input of size n. We also study specializations of this algorithm to common variations of bisimulation for tree-structured XML data sets. We empirically verify efficient performance of the algorithms on graphs and XML documents having billions of nodes and edges, and find that the algorithms can process such graphs efficiently even when very limited internal memory is available. The proposed algorithms are simple enough for practical implementation and use, and open the door for further study of external-memory bisimulation algorithms. To this end, the full open-source C++ implementation has been made freely available.",
author = "J. Hellings and G.H.L. Fletcher and H.J. Haverkort",
year = "2012",
doi = "10.1145/2213836.2213899",
language = "English",
isbn = "978-1-4503-1247-9",
pages = "553--564",
booktitle = "Proceedings of the 31st ACM SIGMOD International Conference on Management of Data (Scottsdale AZ, USA, May 20-24, 2012)",
publisher = "Association for Computing Machinery, Inc",
address = "United States",

}

Hellings, J, Fletcher, GHL & Haverkort, HJ 2012, Efficient external-memory bisimulation on DAGs. in Proceedings of the 31st ACM SIGMOD International Conference on Management of Data (Scottsdale AZ, USA, May 20-24, 2012). Association for Computing Machinery, Inc, New York NY, pp. 553-564. https://doi.org/10.1145/2213836.2213899

Efficient external-memory bisimulation on DAGs. / Hellings, J.; Fletcher, G.H.L.; Haverkort, H.J.

Proceedings of the 31st ACM SIGMOD International Conference on Management of Data (Scottsdale AZ, USA, May 20-24, 2012). New York NY : Association for Computing Machinery, Inc, 2012. p. 553-564.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Efficient external-memory bisimulation on DAGs

AU - Hellings, J.

AU - Fletcher, G.H.L.

AU - Haverkort, H.J.

PY - 2012

Y1 - 2012

N2 - In this paper we introduce the first efficient external-memory algorithm to compute the bisimilarity equivalence classes of a directed acyclic graph (DAG). DAGs are commonly used to model data in a wide variety of practical applications, ranging from XML documents and data provenance models, to web taxonomies and scientific work flows. In the study of efficient reasoning over massive graphs, the notion of node bisimilarity plays a central role. For example, grouping together bisimilar nodes in an XML data set is the first step in many sophisticated approaches to building indexing data structures for efficient XPath query evaluation. To date, however, only internal-memory bisimulation algorithms have been investigated. As the size of real-world DAG data sets often exceeds available main memory, storage in external memory becomes necessary. Hence, there is a practical need for an efficient approach to computing bisimulation in external memory. Our general algorithm has a worst-case IO-complexity of O(Sort(|N| + |E|)), where |N| and |E| are the numbers of nodes and edges, resp., in the data graph and Sort(n) is the number of accesses to external memory needed to sort an input of size n. We also study specializations of this algorithm to common variations of bisimulation for tree-structured XML data sets. We empirically verify efficient performance of the algorithms on graphs and XML documents having billions of nodes and edges, and find that the algorithms can process such graphs efficiently even when very limited internal memory is available. The proposed algorithms are simple enough for practical implementation and use, and open the door for further study of external-memory bisimulation algorithms. To this end, the full open-source C++ implementation has been made freely available.

AB - In this paper we introduce the first efficient external-memory algorithm to compute the bisimilarity equivalence classes of a directed acyclic graph (DAG). DAGs are commonly used to model data in a wide variety of practical applications, ranging from XML documents and data provenance models, to web taxonomies and scientific work flows. In the study of efficient reasoning over massive graphs, the notion of node bisimilarity plays a central role. For example, grouping together bisimilar nodes in an XML data set is the first step in many sophisticated approaches to building indexing data structures for efficient XPath query evaluation. To date, however, only internal-memory bisimulation algorithms have been investigated. As the size of real-world DAG data sets often exceeds available main memory, storage in external memory becomes necessary. Hence, there is a practical need for an efficient approach to computing bisimulation in external memory. Our general algorithm has a worst-case IO-complexity of O(Sort(|N| + |E|)), where |N| and |E| are the numbers of nodes and edges, resp., in the data graph and Sort(n) is the number of accesses to external memory needed to sort an input of size n. We also study specializations of this algorithm to common variations of bisimulation for tree-structured XML data sets. We empirically verify efficient performance of the algorithms on graphs and XML documents having billions of nodes and edges, and find that the algorithms can process such graphs efficiently even when very limited internal memory is available. The proposed algorithms are simple enough for practical implementation and use, and open the door for further study of external-memory bisimulation algorithms. To this end, the full open-source C++ implementation has been made freely available.

U2 - 10.1145/2213836.2213899

DO - 10.1145/2213836.2213899

M3 - Conference contribution

SN - 978-1-4503-1247-9

SP - 553

EP - 564

BT - Proceedings of the 31st ACM SIGMOD International Conference on Management of Data (Scottsdale AZ, USA, May 20-24, 2012)

PB - Association for Computing Machinery, Inc

CY - New York NY

ER -

Hellings J, Fletcher GHL, Haverkort HJ. Efficient external-memory bisimulation on DAGs. In Proceedings of the 31st ACM SIGMOD International Conference on Management of Data (Scottsdale AZ, USA, May 20-24, 2012). New York NY: Association for Computing Machinery, Inc. 2012. p. 553-564 https://doi.org/10.1145/2213836.2213899