Singling the odd ones out: A novelty detection approach to find defects in infrastructure-as-code

Stefano Dalla Palma, Majid Mohammadi, Dario Di Nucci, Damian A. Tamburri

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

Samenvatting

Infrastructure-as-Code (IaC) is increasingly adopted. However, little is known about how to best maintain and evolve it. Previous studies focused on defining Machine-Learning models to predict defect-prone blueprints using supervised binary classification. This class of techniques uses both defective and non-defective instances in the training phase. Furthermore, the high imbalance between defective and non-defective samples makes the training more difficult and leads to unreliable classifiers. In this work, we tackle the defect-prediction problem from a different perspective using novelty detection and evaluate the performance of three techniques, namely OneClassSVM, LocalOutlierFactor, and IsolationForest, and compare their performance with a baseline RandomForest binary classifier. Such models are trained using only non-defective samples: defective data points are treated as novelty because the number of defective samples is too little compared to defective ones. We conduct an empirical study on an extremely-imbalanced dataset consisting of 85 real-world Ansible projects containing only small amounts of defective instances. We found that novelty detection techniques can recognize defects with a high level of precision and recall, an AUC-PR up to 0.86, and an MCC up to 0.31. We deem our results can influence the current trends in defect detection and put forward a new research path toward dealing with this problem.

Originele taal-2Engels
TitelMaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020
RedacteurenFoutse Khomh, Pasquale Salza, Gemma Catolino
UitgeverijAssociation for Computing Machinery, Inc
Pagina's31-36
Aantal pagina's6
ISBN van elektronische versie9781450381246
DOI's
StatusGepubliceerd - 13 nov 2020
Evenement4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, MaLTeSQuE 2020, co-located with the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020 - Virtual, Online, Verenigde Staten van Amerika
Duur: 13 nov 2020 → …

Congres

Congres4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, MaLTeSQuE 2020, co-located with the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020
Land/RegioVerenigde Staten van Amerika
StadVirtual, Online
Periode13/11/20 → …

Vingerafdruk

Duik in de onderzoeksthema's van 'Singling the odd ones out: A novelty detection approach to find defects in infrastructure-as-code'. Samen vormen ze een unieke vingerafdruk.

Citeer dit