Towards Distributed Model Analytics with Apache Spark

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

3 Citations (Scopus)
1 Downloads (Pure)

Abstract

The growing number of models and other related artefacts in model-driven engineering has recently led to the emergence of approaches and tools for analyzing and managing them on a large scale. The framework SAMOS applies techniques inspired by information retrieval and data mining to analyze large sets of models. As the data size and analysis complexity goes up, however, further scalability is needed. In this paper we extend SAMOS to operate on Apache Spark, a popular engine for distributed Big Data processing, by partitioning the data and parallelizing the comparison and analysis phase. We present preliminary studies using a cluster infrastructure and report the results for two datasets: one with 250 Ecore metamodels where we detail the performance gain with various settings, and a larger one of 7.3k metamodels with nearly one million model elements for further demonstrating scalability.

Original languageEnglish
Title of host publicationMODELSWARD 2018 - Proceedings of the 6th International Conference on Model-Driven Engineering and Software Development
EditorsSlimane Hammoudi, Luis Ferreira Pires, Bran Selic
PublisherSciTePress Digital Library
Pages767-772
Number of pages6
ISBN (Electronic)978-989-758-283-7
DOIs
Publication statusPublished - 2018
Event6th International Conference on Model-Driven Engineering and Software Development, MODELSWARD 2018 - Funchal, Madeira, Portugal
Duration: 22 Jan 201824 Jan 2018

Conference

Conference6th International Conference on Model-Driven Engineering and Software Development, MODELSWARD 2018
Country/TerritoryPortugal
CityFunchal, Madeira
Period22/01/1824/01/18

Funding

This work is supported by the 4TU.NIRICT Research Community Funding on Model Management and Analytics in the Netherlands. We also would like to thank SURF, the collaborative ICT organisation for Dutch education and research, for providing us with a computational infrastructure and support.

Keywords

  • Apache Spark
  • Big Data
  • Distributed Computing
  • Model Analytics
  • Model-Driven Engineering
  • Scalability

Fingerprint

Dive into the research topics of 'Towards Distributed Model Analytics with Apache Spark'. Together they form a unique fingerprint.

Cite this