gMark: schema-driven generation of graphs and queries

G. Bagan, A. Bonifati, R. Ciucanu, G.H.L. Fletcher, A. Lemay, N. Advokaat

Research output: Contribution to journalArticleAcademicpeer-review

85 Citations (Scopus)


Massive graph data sets are pervasive in contemporary application domains. Hence, graph database systems are becoming increasingly important. In the experimental study of these systems, it is vital that the research community has shared solutions for the generation of database instances and query workloads having predictable and controllable properties. In this paper, we present the design and engineering principles of gMark
, a domain- and query language-independent graph instance and query workload generator. A core contribution of gMark
is its ability to target and control the diversity of properties of both the generated instances and the generated workloads coupled to these instances. Further novelties include support for regular path queries, a fundamental graph query paradigm, and schema-driven selectivity estimation of queries, a key feature in controlling workload chokepoints. We illustrate the flexibility and practical usability of gMark
by showcasing the framework's capabilities in generating high quality graphs and workloads, and its ability to encode user-defined schemas across a variety of application domains.
Original languageEnglish
Article number7762945
Pages (from-to)856-869
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Issue number4
Publication statusPublished - 1 Apr 2017
Event33rd IEEE International Conference on Data Engineering (ICDE 2017), April 19-22, 2017, San Diego, California, USA - Hilton San Diego Resort and Spa, San Diego, United States
Duration: 19 Apr 201722 Apr 2017
Conference number: 33


  • Graph databases
  • benchmarking
  • recursive queries
  • selectivity estimation


Dive into the research topics of 'gMark: schema-driven generation of graphs and queries'. Together they form a unique fingerprint.

Cite this