Binary disease prediction using tail quantiles of the distribution of continuous biomarkers

Michiel H.J. Paus (Corresponding author), Edwin R. van den Heuvel, Marc J.M. Meddens

Research output: Contribution to journalArticleAcademicpeer-review

82 Downloads (Pure)

Abstract

In the analysis of binary disease classification, numerous techniques exist, but they merely work well for mean differences in biomarkers between cases and controls. Biological processes are, however, much more heterogeneous, and differences could also occur in other distributional characteristics (e.g. variances, skewness). Many machine learning techniques are better capable of utilizing these higher-order distributional differences, sometimes at cost of explainability. In this study, we propose quantile based prediction (QBP), a binary classification method based on the selection of multiple continuous biomarkers and using the tail differences between biomarker distributions of cases and controls. The performance of QBP is compared to supervised learning methods using extensive simulation studies, and two case studies: major depression disorder (MDD) and trisomy. QBP outperformed alternative methods when biomarkers predominantly show variance differences between cases and controls, especially in the MDD case study. More research is needed to further optimise QBP.

Original languageEnglish
Pages (from-to)56-87
Number of pages32
JournalJournal of Nonparametric Statistics
Volume35
Issue number1
Early online date28 Nov 2022
DOIs
Publication statusPublished - Jan 2023

Keywords

  • binary classification
  • discriminant analysis
  • feature selection
  • logistic regression
  • Quantile based prediction (QBP)
  • random forest
  • regularisation
  • XGBoost

Fingerprint

Dive into the research topics of 'Binary disease prediction using tail quantiles of the distribution of continuous biomarkers'. Together they form a unique fingerprint.

Cite this