Metamodel clone detection with SAMOS

Research output: Contribution to journalReview articleAcademicpeer-review

3 Citations (Scopus)
1 Downloads (Pure)

Abstract

Wider adoption of model-driven engineering leads to an abundance of models and metamodels in academic and industrial practice. One of the key techniques for the management and maintenance of such artifacts is model clone detection, where highly similar (meta-)models and (meta-)model fragments are mined from a typically large set of data. In this paper we have extended the SAMOS framework (Statistical Analysis of MOdelS) to clone detection, exemplified on Ecore metamodels. Our clone detection approach uses and extends the framework's feature extraction, vector space model, natural language processing and clustering capabilities. We performed three extensive case studies to demonstrate its accuracy both quantitatively and qualitatively. We first compared the sensitivity and accuracy of SAMOS for metamodel changes through mutation and scenario analysis (which simulate clones) with those of NICAD-Ecore and MACH, tools for clone detection on Ecore and UML models respectively. We then compared the precision and recall of SAMOS and of NICAD-Ecore on a real dataset, consisting of conference management metamodels from the ATL Zoo. Finally we performed a repository-wide mining of metamodel clones from GitHub. We conclude that SAMOS stands out with its higher accuracy and yet considerable scalability for further large-scale clone detection and other empirical studies on metamodels and domain specific languages.

Original languageEnglish
Pages (from-to)57-74
Number of pages18
JournalJournal of Computer Languages
Volume51
DOIs
Publication statusPublished - 1 Apr 2019

Fingerprint

Vector spaces
Scalability
Feature extraction
Statistical methods
Processing

Keywords

  • Clustering
  • Domain-specific languages
  • Empirical software engineering
  • Model analytics
  • Model clone detection
  • Model-driven engineering
  • Repository mining
  • Software maintenance
  • Vector space model

Cite this

@article{bc34f65c1554466b9c23c85dc458ec11,
title = "Metamodel clone detection with SAMOS",
abstract = "Wider adoption of model-driven engineering leads to an abundance of models and metamodels in academic and industrial practice. One of the key techniques for the management and maintenance of such artifacts is model clone detection, where highly similar (meta-)models and (meta-)model fragments are mined from a typically large set of data. In this paper we have extended the SAMOS framework (Statistical Analysis of MOdelS) to clone detection, exemplified on Ecore metamodels. Our clone detection approach uses and extends the framework's feature extraction, vector space model, natural language processing and clustering capabilities. We performed three extensive case studies to demonstrate its accuracy both quantitatively and qualitatively. We first compared the sensitivity and accuracy of SAMOS for metamodel changes through mutation and scenario analysis (which simulate clones) with those of NICAD-Ecore and MACH, tools for clone detection on Ecore and UML models respectively. We then compared the precision and recall of SAMOS and of NICAD-Ecore on a real dataset, consisting of conference management metamodels from the ATL Zoo. Finally we performed a repository-wide mining of metamodel clones from GitHub. We conclude that SAMOS stands out with its higher accuracy and yet considerable scalability for further large-scale clone detection and other empirical studies on metamodels and domain specific languages.",
keywords = "Clustering, Domain-specific languages, Empirical software engineering, Model analytics, Model clone detection, Model-driven engineering, Repository mining, Software maintenance, Vector space model",
author = "{\"O}nder Babur and Loek Cleophas and {van den Brand}, Mark",
year = "2019",
month = "4",
day = "1",
doi = "10.1016/j.cola.2018.12.002",
language = "English",
volume = "51",
pages = "57--74",
journal = "Journal of Computer Languages",
issn = "2590-1184",
publisher = "Elsevier",

}

Metamodel clone detection with SAMOS. / Babur, Önder (Corresponding author); Cleophas, Loek; van den Brand, Mark.

In: Journal of Computer Languages, Vol. 51, 01.04.2019, p. 57-74.

Research output: Contribution to journalReview articleAcademicpeer-review

TY - JOUR

T1 - Metamodel clone detection with SAMOS

AU - Babur, Önder

AU - Cleophas, Loek

AU - van den Brand, Mark

PY - 2019/4/1

Y1 - 2019/4/1

N2 - Wider adoption of model-driven engineering leads to an abundance of models and metamodels in academic and industrial practice. One of the key techniques for the management and maintenance of such artifacts is model clone detection, where highly similar (meta-)models and (meta-)model fragments are mined from a typically large set of data. In this paper we have extended the SAMOS framework (Statistical Analysis of MOdelS) to clone detection, exemplified on Ecore metamodels. Our clone detection approach uses and extends the framework's feature extraction, vector space model, natural language processing and clustering capabilities. We performed three extensive case studies to demonstrate its accuracy both quantitatively and qualitatively. We first compared the sensitivity and accuracy of SAMOS for metamodel changes through mutation and scenario analysis (which simulate clones) with those of NICAD-Ecore and MACH, tools for clone detection on Ecore and UML models respectively. We then compared the precision and recall of SAMOS and of NICAD-Ecore on a real dataset, consisting of conference management metamodels from the ATL Zoo. Finally we performed a repository-wide mining of metamodel clones from GitHub. We conclude that SAMOS stands out with its higher accuracy and yet considerable scalability for further large-scale clone detection and other empirical studies on metamodels and domain specific languages.

AB - Wider adoption of model-driven engineering leads to an abundance of models and metamodels in academic and industrial practice. One of the key techniques for the management and maintenance of such artifacts is model clone detection, where highly similar (meta-)models and (meta-)model fragments are mined from a typically large set of data. In this paper we have extended the SAMOS framework (Statistical Analysis of MOdelS) to clone detection, exemplified on Ecore metamodels. Our clone detection approach uses and extends the framework's feature extraction, vector space model, natural language processing and clustering capabilities. We performed three extensive case studies to demonstrate its accuracy both quantitatively and qualitatively. We first compared the sensitivity and accuracy of SAMOS for metamodel changes through mutation and scenario analysis (which simulate clones) with those of NICAD-Ecore and MACH, tools for clone detection on Ecore and UML models respectively. We then compared the precision and recall of SAMOS and of NICAD-Ecore on a real dataset, consisting of conference management metamodels from the ATL Zoo. Finally we performed a repository-wide mining of metamodel clones from GitHub. We conclude that SAMOS stands out with its higher accuracy and yet considerable scalability for further large-scale clone detection and other empirical studies on metamodels and domain specific languages.

KW - Clustering

KW - Domain-specific languages

KW - Empirical software engineering

KW - Model analytics

KW - Model clone detection

KW - Model-driven engineering

KW - Repository mining

KW - Software maintenance

KW - Vector space model

UR - http://www.scopus.com/inward/record.url?scp=85065094544&partnerID=8YFLogxK

U2 - 10.1016/j.cola.2018.12.002

DO - 10.1016/j.cola.2018.12.002

M3 - Review article

AN - SCOPUS:85065094544

VL - 51

SP - 57

EP - 74

JO - Journal of Computer Languages

JF - Journal of Computer Languages

SN - 2590-1184

ER -