Towards distributed model analytics with apache spark

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Citation (Scopus)
1 Downloads (Pure)

Abstract

The growing number of models and other related artefacts in model-driven engineering has recently led to the emergence of approaches and tools for analyzing and managing them on a large scale. The framework SAMOS applies techniques inspired by information retrieval and data mining to analyze large sets of models. As the data size and analysis complexity goes up, however, further scalability is needed. In this paper we extend SAMOS to operate on Apache Spark, a popular engine for distributed Big Data processing, by partitioning the data and parallelizing the comparison and analysis phase. We present preliminary studies using a cluster infrastructure and report the results for two datasets: one with 250 Ecore metamodels where we detail the performance gain with various settings, and a larger one of 7.3k metamodels with nearly one million model elements for further demonstrating scalability.

Original languageEnglish
Title of host publicationMODELSWARD 2018 - Proceedings of the 6th International Conference on Model-Driven Engineering and Software Development
EditorsSlimane Hammoudi , Luis Ferreira Pires
PublisherSCITEPRESS-Science and Technology Publications, Lda.
Pages767-772
Number of pages6
ISBN (Electronic)978-989-758-283-7
DOIs
Publication statusPublished - 1 Jan 2018
Event6th International Conference on Model-Driven Engineering and Software Development, MODELSWARD 2018 - Funchal, Madeira, Portugal
Duration: 22 Jan 201824 Jan 2018

Conference

Conference6th International Conference on Model-Driven Engineering and Software Development, MODELSWARD 2018
CountryPortugal
CityFunchal, Madeira
Period22/01/1824/01/18

Fingerprint

Electric sparks
Scalability
Information retrieval
Data mining
Engines

Keywords

  • Apache Spark
  • Big Data
  • Distributed Computing
  • Model Analytics
  • Model-Driven Engineering
  • Scalability

Cite this

Babur, Ö., Cleophas, L., & van den Brand, M. (2018). Towards distributed model analytics with apache spark. In S. Hammoudi , & L. Ferreira Pires (Eds.), MODELSWARD 2018 - Proceedings of the 6th International Conference on Model-Driven Engineering and Software Development (pp. 767-772). SCITEPRESS-Science and Technology Publications, Lda.. https://doi.org/10.5220/0006735407670772
Babur, Önder ; Cleophas, Loek ; van den Brand, Mark. / Towards distributed model analytics with apache spark. MODELSWARD 2018 - Proceedings of the 6th International Conference on Model-Driven Engineering and Software Development. editor / Slimane Hammoudi ; Luis Ferreira Pires . SCITEPRESS-Science and Technology Publications, Lda., 2018. pp. 767-772
@inproceedings{e7aad9864ee74a54b9e2598ce785e536,
title = "Towards distributed model analytics with apache spark",
abstract = "The growing number of models and other related artefacts in model-driven engineering has recently led to the emergence of approaches and tools for analyzing and managing them on a large scale. The framework SAMOS applies techniques inspired by information retrieval and data mining to analyze large sets of models. As the data size and analysis complexity goes up, however, further scalability is needed. In this paper we extend SAMOS to operate on Apache Spark, a popular engine for distributed Big Data processing, by partitioning the data and parallelizing the comparison and analysis phase. We present preliminary studies using a cluster infrastructure and report the results for two datasets: one with 250 Ecore metamodels where we detail the performance gain with various settings, and a larger one of 7.3k metamodels with nearly one million model elements for further demonstrating scalability.",
keywords = "Apache Spark, Big Data, Distributed Computing, Model Analytics, Model-Driven Engineering, Scalability",
author = "{\"O}nder Babur and Loek Cleophas and {van den Brand}, Mark",
year = "2018",
month = "1",
day = "1",
doi = "10.5220/0006735407670772",
language = "English",
pages = "767--772",
editor = "{Hammoudi }, Slimane and {Ferreira Pires }, Luis",
booktitle = "MODELSWARD 2018 - Proceedings of the 6th International Conference on Model-Driven Engineering and Software Development",
publisher = "SCITEPRESS-Science and Technology Publications, Lda.",

}

Babur, Ö, Cleophas, L & van den Brand, M 2018, Towards distributed model analytics with apache spark. in S Hammoudi & L Ferreira Pires (eds), MODELSWARD 2018 - Proceedings of the 6th International Conference on Model-Driven Engineering and Software Development. SCITEPRESS-Science and Technology Publications, Lda., pp. 767-772, 6th International Conference on Model-Driven Engineering and Software Development, MODELSWARD 2018, Funchal, Madeira, Portugal, 22/01/18. https://doi.org/10.5220/0006735407670772

Towards distributed model analytics with apache spark. / Babur, Önder; Cleophas, Loek; van den Brand, Mark.

MODELSWARD 2018 - Proceedings of the 6th International Conference on Model-Driven Engineering and Software Development. ed. / Slimane Hammoudi ; Luis Ferreira Pires . SCITEPRESS-Science and Technology Publications, Lda., 2018. p. 767-772.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Towards distributed model analytics with apache spark

AU - Babur, Önder

AU - Cleophas, Loek

AU - van den Brand, Mark

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The growing number of models and other related artefacts in model-driven engineering has recently led to the emergence of approaches and tools for analyzing and managing them on a large scale. The framework SAMOS applies techniques inspired by information retrieval and data mining to analyze large sets of models. As the data size and analysis complexity goes up, however, further scalability is needed. In this paper we extend SAMOS to operate on Apache Spark, a popular engine for distributed Big Data processing, by partitioning the data and parallelizing the comparison and analysis phase. We present preliminary studies using a cluster infrastructure and report the results for two datasets: one with 250 Ecore metamodels where we detail the performance gain with various settings, and a larger one of 7.3k metamodels with nearly one million model elements for further demonstrating scalability.

AB - The growing number of models and other related artefacts in model-driven engineering has recently led to the emergence of approaches and tools for analyzing and managing them on a large scale. The framework SAMOS applies techniques inspired by information retrieval and data mining to analyze large sets of models. As the data size and analysis complexity goes up, however, further scalability is needed. In this paper we extend SAMOS to operate on Apache Spark, a popular engine for distributed Big Data processing, by partitioning the data and parallelizing the comparison and analysis phase. We present preliminary studies using a cluster infrastructure and report the results for two datasets: one with 250 Ecore metamodels where we detail the performance gain with various settings, and a larger one of 7.3k metamodels with nearly one million model elements for further demonstrating scalability.

KW - Apache Spark

KW - Big Data

KW - Distributed Computing

KW - Model Analytics

KW - Model-Driven Engineering

KW - Scalability

UR - http://www.scopus.com/inward/record.url?scp=85052021429&partnerID=8YFLogxK

U2 - 10.5220/0006735407670772

DO - 10.5220/0006735407670772

M3 - Conference contribution

AN - SCOPUS:85052021429

SP - 767

EP - 772

BT - MODELSWARD 2018 - Proceedings of the 6th International Conference on Model-Driven Engineering and Software Development

A2 - Hammoudi , Slimane

A2 - Ferreira Pires , Luis

PB - SCITEPRESS-Science and Technology Publications, Lda.

ER -

Babur Ö, Cleophas L, van den Brand M. Towards distributed model analytics with apache spark. In Hammoudi S, Ferreira Pires L, editors, MODELSWARD 2018 - Proceedings of the 6th International Conference on Model-Driven Engineering and Software Development. SCITEPRESS-Science and Technology Publications, Lda. 2018. p. 767-772 https://doi.org/10.5220/0006735407670772