Abstract
De novo drug design is a key step in pharmaceutical research, and it involves creating molecular candidates with specific properties from scratch. The vast ‘chemical universe’ – estimated to contain up to 10 60 drug-like molecular entities – poses a significant challenge to de novo design. Chemical language models (CLMs) are one subfield of deep learning that has revolutionized de novo drug design. CLMs borrow methods from natural language processing and adapt them to molecules represented as strings (e.g. SMILES strings). This Chapter introduces the core elements of chemical language modeling applied to de novo design, including the representation of molecules as strings, as well as strategies for model training and molecule generation. After describing the main approaches and concepts, it also provides a hands-on example of CLM-driven de novo design with Python and Keras.
Original language | English |
---|---|
Title of host publication | An Introduction to Generative Drug Discovery |
Publisher | CRC Press |
Pages | 45-66 |
Number of pages | 22 |
ISBN (Electronic) | 9781040271308 |
ISBN (Print) | 9781032506234 |
DOIs | |
Publication status | Published - 1 Jan 2025 |
Bibliographical note
Publisher Copyright:© 2025 selection and editorial matter, Sean Ekins.