TY - JOUR
T1 - Binary disease prediction using tail quantiles of the distribution of continuous biomarkers
AU - Paus, Michiel H.J.
AU - van den Heuvel, Edwin R.
AU - Meddens, Marc J.M.
N1 - Publisher Copyright:
© 2022 American Statistical Association and Taylor & Francis.
PY - 2023/1
Y1 - 2023/1
N2 - In the analysis of binary disease classification, numerous techniques exist, but they merely work well for mean differences in biomarkers between cases and controls. Biological processes are, however, much more heterogeneous, and differences could also occur in other distributional characteristics (e.g. variances, skewness). Many machine learning techniques are better capable of utilizing these higher-order distributional differences, sometimes at cost of explainability. In this study, we propose quantile based prediction (QBP), a binary classification method based on the selection of multiple continuous biomarkers and using the tail differences between biomarker distributions of cases and controls. The performance of QBP is compared to supervised learning methods using extensive simulation studies, and two case studies: major depression disorder (MDD) and trisomy. QBP outperformed alternative methods when biomarkers predominantly show variance differences between cases and controls, especially in the MDD case study. More research is needed to further optimise QBP.
AB - In the analysis of binary disease classification, numerous techniques exist, but they merely work well for mean differences in biomarkers between cases and controls. Biological processes are, however, much more heterogeneous, and differences could also occur in other distributional characteristics (e.g. variances, skewness). Many machine learning techniques are better capable of utilizing these higher-order distributional differences, sometimes at cost of explainability. In this study, we propose quantile based prediction (QBP), a binary classification method based on the selection of multiple continuous biomarkers and using the tail differences between biomarker distributions of cases and controls. The performance of QBP is compared to supervised learning methods using extensive simulation studies, and two case studies: major depression disorder (MDD) and trisomy. QBP outperformed alternative methods when biomarkers predominantly show variance differences between cases and controls, especially in the MDD case study. More research is needed to further optimise QBP.
KW - binary classification
KW - discriminant analysis
KW - feature selection
KW - logistic regression
KW - Quantile based prediction (QBP)
KW - random forest
KW - regularisation
KW - XGBoost
UR - http://www.scopus.com/inward/record.url?scp=85142860582&partnerID=8YFLogxK
U2 - 10.1080/10485252.2022.2141738
DO - 10.1080/10485252.2022.2141738
M3 - Article
AN - SCOPUS:85142860582
SN - 1048-5252
VL - 35
SP - 56
EP - 87
JO - Journal of Nonparametric Statistics
JF - Journal of Nonparametric Statistics
IS - 1
ER -