TY - JOUR
T1 - Metamodel clone detection with SAMOS
AU - Babur, Önder
AU - Cleophas, Loek
AU - van den Brand, Mark
PY - 2019/4/1
Y1 - 2019/4/1
N2 - Wider adoption of model-driven engineering leads to an abundance of models and metamodels in academic and industrial practice. One of the key techniques for the management and maintenance of such artifacts is model clone detection, where highly similar (meta-)models and (meta-)model fragments are mined from a typically large set of data. In this paper we have extended the SAMOS framework (Statistical Analysis of MOdelS) to clone detection, exemplified on Ecore metamodels. Our clone detection approach uses and extends the framework's feature extraction, vector space model, natural language processing and clustering capabilities. We performed three extensive case studies to demonstrate its accuracy both quantitatively and qualitatively. We first compared the sensitivity and accuracy of SAMOS for metamodel changes through mutation and scenario analysis (which simulate clones) with those of NICAD-Ecore and MACH, tools for clone detection on Ecore and UML models respectively. We then compared the precision and recall of SAMOS and of NICAD-Ecore on a real dataset, consisting of conference management metamodels from the ATL Zoo. Finally we performed a repository-wide mining of metamodel clones from GitHub. We conclude that SAMOS stands out with its higher accuracy and yet considerable scalability for further large-scale clone detection and other empirical studies on metamodels and domain specific languages.
AB - Wider adoption of model-driven engineering leads to an abundance of models and metamodels in academic and industrial practice. One of the key techniques for the management and maintenance of such artifacts is model clone detection, where highly similar (meta-)models and (meta-)model fragments are mined from a typically large set of data. In this paper we have extended the SAMOS framework (Statistical Analysis of MOdelS) to clone detection, exemplified on Ecore metamodels. Our clone detection approach uses and extends the framework's feature extraction, vector space model, natural language processing and clustering capabilities. We performed three extensive case studies to demonstrate its accuracy both quantitatively and qualitatively. We first compared the sensitivity and accuracy of SAMOS for metamodel changes through mutation and scenario analysis (which simulate clones) with those of NICAD-Ecore and MACH, tools for clone detection on Ecore and UML models respectively. We then compared the precision and recall of SAMOS and of NICAD-Ecore on a real dataset, consisting of conference management metamodels from the ATL Zoo. Finally we performed a repository-wide mining of metamodel clones from GitHub. We conclude that SAMOS stands out with its higher accuracy and yet considerable scalability for further large-scale clone detection and other empirical studies on metamodels and domain specific languages.
KW - Clustering
KW - Domain-specific languages
KW - Empirical software engineering
KW - Model analytics
KW - Model clone detection
KW - Model-driven engineering
KW - Repository mining
KW - Software maintenance
KW - Vector space model
UR - http://www.scopus.com/inward/record.url?scp=85065094544&partnerID=8YFLogxK
U2 - 10.1016/j.cola.2018.12.002
DO - 10.1016/j.cola.2018.12.002
M3 - Review article
AN - SCOPUS:85065094544
SN - 2590-1184
VL - 51
SP - 57
EP - 74
JO - Journal of Computer Languages
JF - Journal of Computer Languages
ER -