Integrating genetic algorithms and language models for enhanced enzyme design

Yves Gaetan Nana Teukam (Corresponding author-nrf), Federico Zipoli, Teodoro Laino, Emanuele Criscuolo, Francesca Grisoni, Matteo Manica

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

1 Citaat (Scopus)
15 Downloads (Pure)

Samenvatting

Enzymes are molecular machines optimized by nature to allow otherwise impossible chemical processes to occur. Their design is a challenging task due to the complexity of the protein space and the intricate relationships between sequence, structure, and function. Recently, large language models (LLMs) have emerged as powerful tools for modeling and analyzing biological sequences, but their application to protein design is limited by the high cardinality of the protein space. This study introduces a framework that combines LLMs with genetic algorithms (GAs) to optimize enzymes. LLMs are trained on a large dataset of protein sequences to learn relationships between amino acid residues linked to structure and function. This knowledge is then leveraged by GAs to efficiently search for sequences with improved catalytic performance. We focused on two optimization tasks: improving the feasibility of biochemical reactions and increasing their turnover rate. Systematic evaluations on 105 biocatalytic reactions demonstrated that the LLM–GA framework generated mutants outperforming the wild-type enzymes in terms of feasibility in 90% of the instances. Further in-depth evaluation of seven reactions reveals the power of this methodology to make “the best of both worlds” and create mutants with structural features and flexibility comparable with the wild types. Our approach advances the state-of-the-art computational design of biocatalysts, ultimately opening opportunities for more sustainable chemical processes.

Originele taal-2Engels
Artikelnummerbbae675
Aantal pagina's9
TijdschriftBriefings in Bioinformatics
Volume26
Nummer van het tijdschrift1
Vroegere onlinedatum8 jan. 2025
DOI's
StatusGepubliceerd - jan. 2025

Bibliografische nota

Publisher Copyright:
© The Author(s) 2025. Published by Oxford University Press.

Financiering

This publication was created as part of NCCR Catalysis (grant number 180544), a National Centre of Competence in Research funded by the Swiss National Science Foundation. F.G. and E.C. acknowledge the support by the European Union (ERC, ReMINDER, 101077879, to F.G.). Views and opinions expressed are however those of the author(s) only and do not necessarily ref lect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.

Vingerafdruk

Duik in de onderzoeksthema's van 'Integrating genetic algorithms and language models for enhanced enzyme design'. Samen vormen ze een unieke vingerafdruk.

Citeer dit