Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories

Research output: Contribution to journalArticleAcademicpeer-review

72 Downloads (Pure)

Abstract

The lack of comprehensive sources of accurate vulnerability data represents a critical obstacle to studying and understanding software vulnerabilities (and their corrections). In this article, we present an approach that combines heuristics stemming from practical experience and machine-learning (ML) - specifically, natural language processing (NLP) - to address this problem. Our method consists of three phases. First, we construct an advisory record object containing key information about a vulnerability that is extracted from an advisory, such as those found in the National Vulnerability Database (NVD). These advisories are expressed in natural language. Second, using heuristics, a subset of candidate fix commits is obtained from the source code repository of the affected project, by filtering out commits that can be identified as unrelated to the vulnerability at hand. Finally, for each of the remaining candidate commits, our method builds a numerical feature vector reflecting the characteristics of the commit that are relevant to predicting its match with the advisory at hand. Based on the values of these feature vectors, our method produces a ranked list of candidate fixing commits. The score attributed by the ML model to each feature is kept visible to the users, allowing them to easily interpret the predictions.We implemented our approach and we evaluated it on an open data set, built by manual curation, that comprises 2,391 known fix commits corresponding to 1,248 public vulnerability advisories. When considering the top-10 commits in the ranked results, our implementation could successfully identify at least one fix commit for up to 84.03% of the vulnerabilities (with a fix commit on the first position for 65.06% of the vulnerabilities). Our evaluation shows that our method can reduce considerably the manual effort needed to search open-source software (OSS) repositories for the commits that fix known vulnerabilities.

Original languageEnglish
Article number134
Number of pages28
JournalACM Transactions on Software Engineering and Methodology
Volume33
Issue number5
DOIs
Publication statusPublished - 4 Jun 2024

Bibliographical note

Publisher Copyright:
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Funding

FundersFunder number
European Union's Horizon 2020 - Research and Innovation Framework Programme952647

    Keywords

    • code-level vulnerability data
    • common vulnerabilities and exposures (CVE)
    • machine learning applied to software security
    • mining software repositories
    • National Vulnerability Database (NVD)
    • Open source software
    • software security

    Fingerprint

    Dive into the research topics of 'Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories'. Together they form a unique fingerprint.

    Cite this