Integrating genetic algorithms and language models for enhanced enzyme design

Yves Gaetan Nana Teukam (Corresponding author-nrf), Federico Zipoli, Teodoro Laino, Emanuele Criscuolo, Francesca Grisoni, Matteo Manica

Research output: Contribution to journalArticleAcademicpeer-review

1 Downloads (Pure)

Abstract

Enzymes are molecular machines optimized by nature to allow otherwise impossible chemical processes to occur. Their design is a challenging task due to the complexity of the protein space and the intricate relationships between sequence, structure, and function. Recently, large language models (LLMs) have emerged as powerful tools for modeling and analyzing biological sequences, but their application to protein design is limited by the high cardinality of the protein space. This study introduces a framework that combines LLMs with genetic algorithms (GAs) to optimize enzymes. LLMs are trained on a large dataset of protein sequences to learn relationships between amino acid residues linked to structure and function. This knowledge is then leveraged by GAs to efficiently search for sequences with improved catalytic performance. We focused on two optimization tasks: improving the feasibility of biochemical reactions and increasing their turnover rate. Systematic evaluations on 105 biocatalytic reactions demonstrated that the LLM–GA framework generated mutants outperforming the wild-type enzymes in terms of feasibility in 90% of the instances. Further in-depth evaluation of seven reactions reveals the power of this methodology to make “the best of both worlds” and create mutants with structural features and flexibility comparable with the wild types. Our approach advances the state-of-the-art computational design of biocatalysts, ultimately opening opportunities for more sustainable chemical processes.

Original languageEnglish
Article numberbbae675
Number of pages9
JournalBriefings in Bioinformatics
Volume26
Issue number1
Early online date8 Jan 2025
DOIs
Publication statusPublished - Jan 2025

Bibliographical note

Publisher Copyright:
© The Author(s) 2025. Published by Oxford University Press.

Keywords

  • biocatalysis
  • computational protein design
  • enzyme optimization
  • genetic algorithms
  • large language models
  • Algorithms
  • Biocatalysis
  • Protein Engineering/methods
  • Computational Biology/methods
  • Enzymes/chemistry

Fingerprint

Dive into the research topics of 'Integrating genetic algorithms and language models for enhanced enzyme design'. Together they form a unique fingerprint.

Cite this