De Novo Drug Design by Chemical Language Modeling

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

Abstract

De novo drug design is a key step in pharmaceutical research, and it involves creating molecular candidates with specific properties from scratch. The vast ‘chemical universe’ – estimated to contain up to 10 60 drug-like molecular entities – poses a significant challenge to de novo design. Chemical language models (CLMs) are one subfield of deep learning that has revolutionized de novo drug design. CLMs borrow methods from natural language processing and adapt them to molecules represented as strings (e.g. SMILES strings). This Chapter introduces the core elements of chemical language modeling applied to de novo design, including the representation of molecules as strings, as well as strategies for model training and molecule generation. After describing the main approaches and concepts, it also provides a hands-on example of CLM-driven de novo design with Python and Keras.

Original languageEnglish
Title of host publicationAn Introduction to Generative Drug Discovery
PublisherCRC Press
Pages45-66
Number of pages22
ISBN (Electronic)9781040271308
ISBN (Print)9781032506234
DOIs
Publication statusPublished - 1 Jan 2025

Bibliographical note

Publisher Copyright:
© 2025 selection and editorial matter, Sean Ekins.

Fingerprint

Dive into the research topics of 'De Novo Drug Design by Chemical Language Modeling'. Together they form a unique fingerprint.

Cite this