TY - JOUR
T1 - Improved one-class modeling of high-dimensional metabolomics data via eigenvalue-shrinkage
AU - Brini, Alberto
AU - Avagyan, Vahe
AU - de Vos, Ric C.H.
AU - Vossen, Jack H.
AU - van den Heuvel, Edwin R.
AU - Engel, Jasper
N1 - Funding Information:
Funding: This work was funded by NWO-PTA-COAST3 through the Outfitting the Factory of the Future with Online analysis (OFF/On) consortium.
Funding Information:
Acknowledgments: We would like to thank Gijs Kleter (Wageningen Food Safety Research, Wa-geningen) and Hilko van der Voet (Biometris, Wageningen) for their insightful comments at different stages of this research. Udo Engelke and R. Wevers are gratefully acknowledged for sharing NMR medical data, and Henriëtte van Eekelen and Bert Schipper (Plant Metabolomics group. WUR-Bioscience) are gratefully acknowledged for their help with the potato LCMS measurements. The potato data set was acquired within the context of the Biotechnology and Safety program (Proj. No. 15815) of the Netherlands Organization for Scientific Research and the Ministry of Infrastructure and Water Management.
.
PY - 2021/4/13
Y1 - 2021/4/13
N2 - One-class modelling is a useful approach in metabolomics for the untargeted detection of abnormal metabolite profiles, when information from a set of reference observations is available to model “normal” or baseline metabolite profiles. Such outlying profiles are typically identified by comparing the distance between an observation and the reference class to a critical limit. Often, multivariate distance measures such as the Mahalanobis distance (MD) or principal component-based measures are used. These approaches, however, are either not applicable to untargeted metabolomics data, or their results are unreliable. In this paper, five distance measures for one-class modeling in untargeted metabolites are proposed. They are based on a combination of the MD and five so-called eigenvalue-shrinkage estimators of the covariance matrix of the reference class. A simple cross-validation procedure is proposed to set the critical limit for outlier detection. Simulation studies are used to identify which distance measure provides the best performance for one-class modeling, in terms of type I error and power to identify abnormal metabolite profiles. Empirical evidence demonstrates that this method has better type I error (false positive rate) and improved outlier detection power than the standard (principal component-based) one-class models. The method is illustrated by its application to liquid chromatography coupled to mass spectrometry (LC-MS) and nuclear magnetic response spectroscopy (NMR) untargeted metabolomics data from two studies on food safety assessment and diagnosis of rare diseases, respectively.
AB - One-class modelling is a useful approach in metabolomics for the untargeted detection of abnormal metabolite profiles, when information from a set of reference observations is available to model “normal” or baseline metabolite profiles. Such outlying profiles are typically identified by comparing the distance between an observation and the reference class to a critical limit. Often, multivariate distance measures such as the Mahalanobis distance (MD) or principal component-based measures are used. These approaches, however, are either not applicable to untargeted metabolomics data, or their results are unreliable. In this paper, five distance measures for one-class modeling in untargeted metabolites are proposed. They are based on a combination of the MD and five so-called eigenvalue-shrinkage estimators of the covariance matrix of the reference class. A simple cross-validation procedure is proposed to set the critical limit for outlier detection. Simulation studies are used to identify which distance measure provides the best performance for one-class modeling, in terms of type I error and power to identify abnormal metabolite profiles. Empirical evidence demonstrates that this method has better type I error (false positive rate) and improved outlier detection power than the standard (principal component-based) one-class models. The method is illustrated by its application to liquid chromatography coupled to mass spectrometry (LC-MS) and nuclear magnetic response spectroscopy (NMR) untargeted metabolomics data from two studies on food safety assessment and diagnosis of rare diseases, respectively.
KW - Critical value
KW - Eigenvalue-shrinkage
KW - High-dimensional data
KW - Mahalonobis distance
KW - One-class model
KW - Scaled Chi-square distribution
KW - Untargeted metabolomics
UR - http://www.scopus.com/inward/record.url?scp=85104943871&partnerID=8YFLogxK
U2 - 10.3390/metabo11040237
DO - 10.3390/metabo11040237
M3 - Article
C2 - 33924479
AN - SCOPUS:85104943871
SN - 2218-1989
VL - 11
JO - Metabolites
JF - Metabolites
IS - 4
M1 - 237
ER -