Samenvatting
Autoencoder models of source code are an emerging alternative to autoregressive large language models with important benefits for genetic improvement of software. We hypothesize that encoder-decoder architectures are suboptimal for source code because they ignore the grammatical structure that can be derived with an Abstract Syntax Tree parser. We propose a structured Variational Auto-Encoder based on TreeLSTM that operates directly on the AST. We train it along with a baseline sequence VAE on a dataset of competitive programming submissions We find the structured model to perform better in most tests, with some notable exceptions. These findings suggest structured autoencoder models could enable more effective generation and manipulation of source code for tasks like automated bug fixing and generative programming.
| Originele taal-2 | Engels |
|---|---|
| Artikelnummer | 10886936 |
| Pagina's (van-tot) | 30262-30273 |
| Aantal pagina's | 12 |
| Tijdschrift | IEEE Access |
| Volume | 13 |
| DOI's | |
| Status | Gepubliceerd - 2025 |
Bibliografische nota
Publisher Copyright:© 2025 IEEE.
Vingerafdruk
Duik in de onderzoeksthema's van 'Tree Variational Autoencoder for Code'. Samen vormen ze een unieke vingerafdruk.Citeer dit
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver