Abstract
The growing number of models and other related artefacts in model-driven engineering has recently led to the emergence of approaches and tools for analyzing and managing them on a large scale. The framework SAMOS applies techniques inspired by information retrieval and data mining to analyze large sets of models. As the data size and analysis complexity goes up, however, further scalability is needed. In this paper we extend SAMOS to operate on Apache Spark, a popular engine for distributed Big Data processing, by partitioning the data and parallelizing the comparison and analysis phase. We present preliminary studies using a cluster infrastructure and report the results for two datasets: one with 250 Ecore metamodels where we detail the performance gain with various settings, and a larger one of 7.3k metamodels with nearly one million model elements for further demonstrating scalability.
Original language | English |
---|---|
Title of host publication | MODELSWARD 2018 - Proceedings of the 6th International Conference on Model-Driven Engineering and Software Development |
Editors | Slimane Hammoudi, Luis Ferreira Pires, Bran Selic |
Publisher | SciTePress Digital Library |
Pages | 767-772 |
Number of pages | 6 |
ISBN (Electronic) | 978-989-758-283-7 |
DOIs | |
Publication status | Published - 2018 |
Event | 6th International Conference on Model-Driven Engineering and Software Development, MODELSWARD 2018 - Funchal, Madeira, Portugal Duration: 22 Jan 2018 → 24 Jan 2018 |
Conference
Conference | 6th International Conference on Model-Driven Engineering and Software Development, MODELSWARD 2018 |
---|---|
Country/Territory | Portugal |
City | Funchal, Madeira |
Period | 22/01/18 → 24/01/18 |
Funding
This work is supported by the 4TU.NIRICT Research Community Funding on Model Management and Analytics in the Netherlands. We also would like to thank SURF, the collaborative ICT organisation for Dutch education and research, for providing us with a computational infrastructure and support.
Keywords
- Apache Spark
- Big Data
- Distributed Computing
- Model Analytics
- Model-Driven Engineering
- Scalability