Consensus versus Individual QSARs in Classification: Comparison on a Large-Scale Case Study

Cecile Valsecchi, Francesca Grisoni, Viviana Consonni, Davide Ballabio

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

28 Citaten (Scopus)


Consensus strategies have been widely applied in many different scientific fields, based on the assumption that the fusion of several sources of information increases the outcome reliability. Despite the widespread application of consensus approaches, their advantages in quantitative structure-activity relationship (QSAR) modeling have not been thoroughly evaluated, mainly due to the lack of appropriate large-scale data sets. In this study, we evaluated the advantages and drawbacks of consensus approaches compared to single classification QSAR models. To this end, we used a data set of three properties (androgen receptor binding, agonism, and antagonism) for approximately 4000 molecules with predictions performed by more than 20 QSAR models, made available in a large-scale collaborative project. The individual QSAR models were compared with two consensus approaches, majority voting and the Bayes consensus with discrete probability distributions, in both protective and nonprotective forms. Consensus strategies proved to be more accurate and to better cover the analyzed chemical space than individual QSARs on average, thus motivating their widespread application for property prediction. Scripts and data to reproduce the results of this study are available for download.

Originele taal-2Engels
Pagina's (van-tot)1215-1223
Aantal pagina's9
TijdschriftJournal of Chemical Information and Modeling
Nummer van het tijdschrift3
StatusGepubliceerd - 23 mrt. 2020
Extern gepubliceerdJa


The authors thank Dr. Kamel Mansouri for his valuable comments and feedback on the manuscript. F.G. was supported by the Swiss National Science Foundation (SNSF, Grant No. 205321_182176). The QSAR models considered in this work were previously developed in the framework of a collaborative project (Collaborative Modeling Project of Androgen Receptor Activity, CoMPARA), coordinated by the National Center of Computational Toxicology (U.S. Environmental Protection Agency). CoMPARA aimed to develop in silico approaches to identify potential androgen receptor (AR) modulators. This project involved 25 research groups worldwide, which were provided with a calibration set consisting of 1689 chemicals and the corresponding experimental annotations on binding, agonism, and antagonism activities (in the form of qualitative labels, active/inactive), as determined by a battery of 11 in vitro assays. The research groups were then asked to predict another 55 450 chemicals for one or more endpoints (binding, agonism, and antagonism) using their own developed QSAR models. Finally, these predictions were merged through ad hoc consensus approaches, which are currently being used by the CoMPARA coordinators to prioritize experimental tests for potential endocrine-disrupting chemicals.

U.S. Environmental Protection Agency
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung205321_182176


    Duik in de onderzoeksthema's van 'Consensus versus Individual QSARs in Classification: Comparison on a Large-Scale Case Study'. Samen vormen ze een unieke vingerafdruk.

    Citeer dit