Using n-grams for the automated clustering of structural models

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

8 Citations (Scopus)
1 Downloads (Pure)

Abstract

Model comparison and clustering are important for dealing with many models in data analysis and exploration, e.g. in domain model recovery or model repository management. Particularly in structural models, information is captured not only in model elements (e.g. in names and types) but also in the structural context, i.e. the relation of one element to the others. Some approaches involve a large number of models ignoring the structural context of model elements; others handle very few (typically two) models applying sophisticated structural techniques. In this paper we address both aspects and extend our previous work on model clustering based on vector space model, with a technique for incorporating structural context in the form of n-grams. We compare the n-gram accuracy on two datasets of Ecore metamodels in AtlanMod Zoo: small random samples using up to trigrams and a larger one (∼100 models) up to bigrams.

Original languageEnglish
Title of host publicationSOFSEM 2017: Theory and Practice of Computer Science - 43rd International Conference on Current Trends in Theory and Practice of Computer Science, Proceedings
PublisherSpringer
Pages510-524
Number of pages15
ISBN (Print)9783319519623
DOIs
Publication statusPublished - 2017
Event43rd Conference on Current Trends in Theory and Practice of Computer Science, (SOFSEM 2017), Januari 16-20, 2017, Limerick, Ireland - Limerick, Ireland
Duration: 16 Jan 201720 Jan 2017

Publication series

NameLecture Notes in Computer Science
Volume10139
ISSN (Print)03029743
ISSN (Electronic)16113349

Conference

Conference43rd Conference on Current Trends in Theory and Practice of Computer Science, (SOFSEM 2017), Januari 16-20, 2017, Limerick, Ireland
CountryIreland
CityLimerick
Period16/01/1720/01/17

Fingerprint

N-gram
Structural Model
Clustering
Model
Model-based Clustering
Vector Space Model
Model Comparison
Domain Model
Metamodel
Repository
Data analysis
Recovery
Vector spaces
Context

Keywords

  • Hierarchical clustering
  • Model comparison
  • Model-driven engineering
  • N-grams
  • Vector space model

Cite this

Babur, Ö., & Cleophas, L. (2017). Using n-grams for the automated clustering of structural models. In SOFSEM 2017: Theory and Practice of Computer Science - 43rd International Conference on Current Trends in Theory and Practice of Computer Science, Proceedings (pp. 510-524). (Lecture Notes in Computer Science; Vol. 10139). Springer. https://doi.org/10.1007/978-3-319-51963-0_40
Babur, Önder ; Cleophas, Loek. / Using n-grams for the automated clustering of structural models. SOFSEM 2017: Theory and Practice of Computer Science - 43rd International Conference on Current Trends in Theory and Practice of Computer Science, Proceedings. Springer, 2017. pp. 510-524 (Lecture Notes in Computer Science).
@inproceedings{1006a570277c44f4befe62a5f32ecd45,
title = "Using n-grams for the automated clustering of structural models",
abstract = "Model comparison and clustering are important for dealing with many models in data analysis and exploration, e.g. in domain model recovery or model repository management. Particularly in structural models, information is captured not only in model elements (e.g. in names and types) but also in the structural context, i.e. the relation of one element to the others. Some approaches involve a large number of models ignoring the structural context of model elements; others handle very few (typically two) models applying sophisticated structural techniques. In this paper we address both aspects and extend our previous work on model clustering based on vector space model, with a technique for incorporating structural context in the form of n-grams. We compare the n-gram accuracy on two datasets of Ecore metamodels in AtlanMod Zoo: small random samples using up to trigrams and a larger one (∼100 models) up to bigrams.",
keywords = "Hierarchical clustering, Model comparison, Model-driven engineering, N-grams, Vector space model",
author = "{\"O}nder Babur and Loek Cleophas",
year = "2017",
doi = "10.1007/978-3-319-51963-0_40",
language = "English",
isbn = "9783319519623",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
pages = "510--524",
booktitle = "SOFSEM 2017: Theory and Practice of Computer Science - 43rd International Conference on Current Trends in Theory and Practice of Computer Science, Proceedings",
address = "Germany",

}

Babur, Ö & Cleophas, L 2017, Using n-grams for the automated clustering of structural models. in SOFSEM 2017: Theory and Practice of Computer Science - 43rd International Conference on Current Trends in Theory and Practice of Computer Science, Proceedings. Lecture Notes in Computer Science, vol. 10139, Springer, pp. 510-524, 43rd Conference on Current Trends in Theory and Practice of Computer Science, (SOFSEM 2017), Januari 16-20, 2017, Limerick, Ireland , Limerick, Ireland, 16/01/17. https://doi.org/10.1007/978-3-319-51963-0_40

Using n-grams for the automated clustering of structural models. / Babur, Önder; Cleophas, Loek.

SOFSEM 2017: Theory and Practice of Computer Science - 43rd International Conference on Current Trends in Theory and Practice of Computer Science, Proceedings. Springer, 2017. p. 510-524 (Lecture Notes in Computer Science; Vol. 10139).

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Using n-grams for the automated clustering of structural models

AU - Babur, Önder

AU - Cleophas, Loek

PY - 2017

Y1 - 2017

N2 - Model comparison and clustering are important for dealing with many models in data analysis and exploration, e.g. in domain model recovery or model repository management. Particularly in structural models, information is captured not only in model elements (e.g. in names and types) but also in the structural context, i.e. the relation of one element to the others. Some approaches involve a large number of models ignoring the structural context of model elements; others handle very few (typically two) models applying sophisticated structural techniques. In this paper we address both aspects and extend our previous work on model clustering based on vector space model, with a technique for incorporating structural context in the form of n-grams. We compare the n-gram accuracy on two datasets of Ecore metamodels in AtlanMod Zoo: small random samples using up to trigrams and a larger one (∼100 models) up to bigrams.

AB - Model comparison and clustering are important for dealing with many models in data analysis and exploration, e.g. in domain model recovery or model repository management. Particularly in structural models, information is captured not only in model elements (e.g. in names and types) but also in the structural context, i.e. the relation of one element to the others. Some approaches involve a large number of models ignoring the structural context of model elements; others handle very few (typically two) models applying sophisticated structural techniques. In this paper we address both aspects and extend our previous work on model clustering based on vector space model, with a technique for incorporating structural context in the form of n-grams. We compare the n-gram accuracy on two datasets of Ecore metamodels in AtlanMod Zoo: small random samples using up to trigrams and a larger one (∼100 models) up to bigrams.

KW - Hierarchical clustering

KW - Model comparison

KW - Model-driven engineering

KW - N-grams

KW - Vector space model

UR - http://www.scopus.com/inward/record.url?scp=85010689255&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-51963-0_40

DO - 10.1007/978-3-319-51963-0_40

M3 - Conference contribution

AN - SCOPUS:85010689255

SN - 9783319519623

T3 - Lecture Notes in Computer Science

SP - 510

EP - 524

BT - SOFSEM 2017: Theory and Practice of Computer Science - 43rd International Conference on Current Trends in Theory and Practice of Computer Science, Proceedings

PB - Springer

ER -

Babur Ö, Cleophas L. Using n-grams for the automated clustering of structural models. In SOFSEM 2017: Theory and Practice of Computer Science - 43rd International Conference on Current Trends in Theory and Practice of Computer Science, Proceedings. Springer. 2017. p. 510-524. (Lecture Notes in Computer Science). https://doi.org/10.1007/978-3-319-51963-0_40