Blocked inverted indices for exact clustering of large chemical spaces

P. Thiel, L. Sach-Peltason, C. Ottmann, O. Kohlbacher

Research output: Contribution to journalArticleAcademicpeer-review

9 Citations (Scopus)
1 Downloads (Pure)

Abstract

The calculation of pairwise compound similarities based on fingerprints is one of the fundamental tasks in chemoinformatics. Methods for efficient calculation of compound similarities are of utmost importance for various applications like similarity searching or library clustering. With the increasing size of public compound databases, exact clustering of these databases is desirable, but often computationally prohibitively expensive. We present an optimized inverted index algorithm for the calculation of all pairwise similarities on 2D fingerprints of a given dataset. In contrast to other algorithms it does neither require GPU computing, nor does it yield a stochastic approximation of the clustering. The algorithm has been designed to work well with multicore architectures and shows excellent parallel speedup. As an application example of this algorithm we implemented a deterministic clustering application, which has been designed to decompose virtual libraries comprising tens of millions of compounds in a short time on current hardware. Our results show, that our implementation achieves more than 400 million Tanimoto similarity calculations per second on a common desktop CPU. Deterministic clustering of the available chemical space thus can be done on modern multicore machines within a few days.
Original languageEnglish
Pages (from-to)2395-2401
JournalJournal of Chemical Information and Modeling
Volume54
Issue number9
DOIs
Publication statusPublished - 2014

Fingerprint

Dive into the research topics of 'Blocked inverted indices for exact clustering of large chemical spaces'. Together they form a unique fingerprint.

Cite this