Improving custom-tailored variability mining using outlier and cluster detection

David Wille, Önder Babur, Loek Cleophas, Christoph Seidl, Mark van den Brand, Ina Schaefer

Research output: Contribution to journalArticleAcademicpeer-review

4 Citations (Scopus)
1 Downloads (Pure)

Abstract

To satisfy demand for customized software solutions, companies commonly use so-called clone-and-own approaches to reuse functionality by copying existing realization artifacts and modifying them to create new product variants. Lacking clear documentation about the variability relations (i.e., the common and varying parts), the resulting variants have to be developed, maintained and evolved in isolation. In previous work, we introduced a semi-automatic mining algorithm allowing custom-tailored identification of distinct variability relations for block-based model variants (e.g., MATLAB/Simulink models or statecharts) using user-adjustable metrics. However, variants completely unrelated with other variants (i.e., outliers) can negatively influence the usefulness of the generated variability relations for developers maintaining the variants (e.g., erroneous relations might be identified). In addition, splitting the compared models into smaller sets (i.e., clusters) can be sensible to provide developers separate view points on different variable system features. In further previous work, we proposed statistical clustering capable of identifying such outliers and clusters. The contribution of this paper is twofold. First, we present guidelines and a generic implementation that both ease adaptation of our variability mining algorithm for new languages. Second, we integrate our clustering approach as a preprocessing step to the mining. This allows users to remove outliers prior to executing variability mining on suggested clusters. Using models from two industrial case studies, we show feasibility of the approach and discuss how our clustering can support our variability mining in identifying sensible variability information.

Original languageEnglish
Pages (from-to)62-84
Number of pages23
JournalScience of Computer Programming
Volume163
DOIs
Publication statusPublished - 1 Oct 2018

Fingerprint

Copying
MATLAB
Identification (control systems)
Industry

Keywords

  • Block-based language
  • Clone-and-own
  • Conceptual framework
  • Outlier and cluster detection
  • Variability mining

Cite this

@article{bc5780b242e34ebaae367877c8386411,
title = "Improving custom-tailored variability mining using outlier and cluster detection",
abstract = "To satisfy demand for customized software solutions, companies commonly use so-called clone-and-own approaches to reuse functionality by copying existing realization artifacts and modifying them to create new product variants. Lacking clear documentation about the variability relations (i.e., the common and varying parts), the resulting variants have to be developed, maintained and evolved in isolation. In previous work, we introduced a semi-automatic mining algorithm allowing custom-tailored identification of distinct variability relations for block-based model variants (e.g., MATLAB/Simulink models or statecharts) using user-adjustable metrics. However, variants completely unrelated with other variants (i.e., outliers) can negatively influence the usefulness of the generated variability relations for developers maintaining the variants (e.g., erroneous relations might be identified). In addition, splitting the compared models into smaller sets (i.e., clusters) can be sensible to provide developers separate view points on different variable system features. In further previous work, we proposed statistical clustering capable of identifying such outliers and clusters. The contribution of this paper is twofold. First, we present guidelines and a generic implementation that both ease adaptation of our variability mining algorithm for new languages. Second, we integrate our clustering approach as a preprocessing step to the mining. This allows users to remove outliers prior to executing variability mining on suggested clusters. Using models from two industrial case studies, we show feasibility of the approach and discuss how our clustering can support our variability mining in identifying sensible variability information.",
keywords = "Block-based language, Clone-and-own, Conceptual framework, Outlier and cluster detection, Variability mining",
author = "David Wille and {\"O}nder Babur and Loek Cleophas and Christoph Seidl and {van den Brand}, Mark and Ina Schaefer",
year = "2018",
month = "10",
day = "1",
doi = "10.1016/j.scico.2018.04.002",
language = "English",
volume = "163",
pages = "62--84",
journal = "Science of Computer Programming",
issn = "0167-6423",
publisher = "Elsevier",

}

Improving custom-tailored variability mining using outlier and cluster detection. / Wille, David; Babur, Önder; Cleophas, Loek; Seidl, Christoph; van den Brand, Mark; Schaefer, Ina.

In: Science of Computer Programming, Vol. 163, 01.10.2018, p. 62-84.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Improving custom-tailored variability mining using outlier and cluster detection

AU - Wille, David

AU - Babur, Önder

AU - Cleophas, Loek

AU - Seidl, Christoph

AU - van den Brand, Mark

AU - Schaefer, Ina

PY - 2018/10/1

Y1 - 2018/10/1

N2 - To satisfy demand for customized software solutions, companies commonly use so-called clone-and-own approaches to reuse functionality by copying existing realization artifacts and modifying them to create new product variants. Lacking clear documentation about the variability relations (i.e., the common and varying parts), the resulting variants have to be developed, maintained and evolved in isolation. In previous work, we introduced a semi-automatic mining algorithm allowing custom-tailored identification of distinct variability relations for block-based model variants (e.g., MATLAB/Simulink models or statecharts) using user-adjustable metrics. However, variants completely unrelated with other variants (i.e., outliers) can negatively influence the usefulness of the generated variability relations for developers maintaining the variants (e.g., erroneous relations might be identified). In addition, splitting the compared models into smaller sets (i.e., clusters) can be sensible to provide developers separate view points on different variable system features. In further previous work, we proposed statistical clustering capable of identifying such outliers and clusters. The contribution of this paper is twofold. First, we present guidelines and a generic implementation that both ease adaptation of our variability mining algorithm for new languages. Second, we integrate our clustering approach as a preprocessing step to the mining. This allows users to remove outliers prior to executing variability mining on suggested clusters. Using models from two industrial case studies, we show feasibility of the approach and discuss how our clustering can support our variability mining in identifying sensible variability information.

AB - To satisfy demand for customized software solutions, companies commonly use so-called clone-and-own approaches to reuse functionality by copying existing realization artifacts and modifying them to create new product variants. Lacking clear documentation about the variability relations (i.e., the common and varying parts), the resulting variants have to be developed, maintained and evolved in isolation. In previous work, we introduced a semi-automatic mining algorithm allowing custom-tailored identification of distinct variability relations for block-based model variants (e.g., MATLAB/Simulink models or statecharts) using user-adjustable metrics. However, variants completely unrelated with other variants (i.e., outliers) can negatively influence the usefulness of the generated variability relations for developers maintaining the variants (e.g., erroneous relations might be identified). In addition, splitting the compared models into smaller sets (i.e., clusters) can be sensible to provide developers separate view points on different variable system features. In further previous work, we proposed statistical clustering capable of identifying such outliers and clusters. The contribution of this paper is twofold. First, we present guidelines and a generic implementation that both ease adaptation of our variability mining algorithm for new languages. Second, we integrate our clustering approach as a preprocessing step to the mining. This allows users to remove outliers prior to executing variability mining on suggested clusters. Using models from two industrial case studies, we show feasibility of the approach and discuss how our clustering can support our variability mining in identifying sensible variability information.

KW - Block-based language

KW - Clone-and-own

KW - Conceptual framework

KW - Outlier and cluster detection

KW - Variability mining

UR - http://www.scopus.com/inward/record.url?scp=85046631676&partnerID=8YFLogxK

U2 - 10.1016/j.scico.2018.04.002

DO - 10.1016/j.scico.2018.04.002

M3 - Article

AN - SCOPUS:85046631676

VL - 163

SP - 62

EP - 84

JO - Science of Computer Programming

JF - Science of Computer Programming

SN - 0167-6423

ER -