Singling the odd ones out: A novelty detection approach to find defects in infrastructure-as-code

Stefano Dalla Palma, Majid Mohammadi, Dario Di Nucci, Damian A. Tamburri

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

2 Citaten (Scopus)

Samenvatting

Infrastructure-as-Code (IaC) is increasingly adopted. However, little is known about how to best maintain and evolve it. Previous studies focused on defining Machine-Learning models to predict defect-prone blueprints using supervised binary classification. This class of techniques uses both defective and non-defective instances in the training phase. Furthermore, the high imbalance between defective and non-defective samples makes the training more difficult and leads to unreliable classifiers. In this work, we tackle the defect-prediction problem from a different perspective using novelty detection and evaluate the performance of three techniques, namely OneClassSVM, LocalOutlierFactor, and IsolationForest, and compare their performance with a baseline RandomForest binary classifier. Such models are trained using only non-defective samples: defective data points are treated as novelty because the number of defective samples is too little compared to defective ones. We conduct an empirical study on an extremely-imbalanced dataset consisting of 85 real-world Ansible projects containing only small amounts of defective instances. We found that novelty detection techniques can recognize defects with a high level of precision and recall, an AUC-PR up to 0.86, and an MCC up to 0.31. We deem our results can influence the current trends in defect detection and put forward a new research path toward dealing with this problem.

Originele taal-2Engels
TitelMaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020
RedacteurenFoutse Khomh, Pasquale Salza, Gemma Catolino
UitgeverijAssociation for Computing Machinery, Inc
Pagina's31-36
Aantal pagina's6
ISBN van elektronische versie9781450381246
DOI's
StatusGepubliceerd - 13 nov. 2020
Evenement4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, MaLTeSQuE 2020, co-located with the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020 - Virtual, Online, Verenigde Staten van Amerika
Duur: 13 nov. 2020 → …

Congres

Congres4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, MaLTeSQuE 2020, co-located with the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020
Land/RegioVerenigde Staten van Amerika
StadVirtual, Online
Periode13/11/20 → …

Financiering

This work is supported by the European Commission grant no. 825040 (RADON H2020).

FinanciersFinanciernummer
European Union’s Horizon Europe research and innovation programme825040
European CommissionRADON H2020

    Vingerafdruk

    Duik in de onderzoeksthema's van 'Singling the odd ones out: A novelty detection approach to find defects in infrastructure-as-code'. Samen vormen ze een unieke vingerafdruk.

    Citeer dit