Abstract
Automated Program Repair (APR) has advanced significantly with the emergence of pre-trained Code Language Models (CLMs), enabling the generation of high-quality patches. However, selecting the most suitable CLM for APR remains challenging due to a range of factors, including accuracy, efficiency, and scalability, among others. These factors are interdependent and interact in complex ways, making the selection of a CLM for APR a multifaceted problem. This study systematically evaluates 20 pre-trained CLMs, ranging from 60M to 16B parameters, on the HumanEval-Java benchmark (163 buggy Java methods). The evaluation examines bug-fixing accuracy, resource consumption, compilability, patch diversity, and sampling strategies (beam search vs. nucleus sampling). Results indicate that larger models such as CodeLLaMA-13B and StarCoder generally perform better in bug fixing and compiler error handling, but scale alone does not guarantee effectiveness, as some (e.g., CodeGen2) perform poorly despite their size. Notably, memory usage increases with model size, but time consumption does not exhibit a clear correlation, suggesting that efficiency is influenced by architecture rather than scale alone. Additionally, nucleus sampling slightly outperforms beam search, though the difference is not statistically significant. Since no single CLM fixes all bugs, these findings highlight the potential of hybrid or ensemble-based CLM-driven APR approaches for more robust bug-fixing.
| Original language | English |
|---|---|
| Title of host publication | GPCE '25 |
| Subtitle of host publication | Proceedings of the 24th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences |
| Editors | Amir Shaikhha, Sebastian Erdweg, Nada Amin |
| Place of Publication | New York |
| Publisher | Association for Computing Machinery, Inc. |
| Pages | 13-26 |
| Number of pages | 14 |
| ISBN (Electronic) | 979-8-4007-1995-0 |
| DOIs | |
| Publication status | Published - 27 Jun 2025 |
| Event | 24th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2025 with ECOOP 2025 - Bergen, Norway Duration: 3 Jul 2025 → 4 Jul 2025 |
Conference
| Conference | 24th ACM SIGPLAN International Conference on Generative Programming |
|---|---|
| Abbreviated title | GPCE 2025 |
| Country/Territory | Norway |
| City | Bergen |
| Period | 3/07/25 → 4/07/25 |
Keywords
- Automated program repair
- Java
- empirical study
- pre-trained code language model
- zero-shot learning
Fingerprint
Dive into the research topics of 'Comparative Analysis of Pre-Trained Code Language Models for Automated Program Repair via Code Infill Generation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver