Investigating the need for preprocessing of near-infrared spectroscopic data as a function of sample size

Mark Schoot, Christiaan Kapper, Geert H. van Kollenburg, Geert J. Postma, Gijs van Kessel, Lutgarde M. C. Buydens, Jeroen J. Jansen (Corresponding author)

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

51 Citaten (Scopus)

Samenvatting

Preprocessing of near-infrared (NIR) spectra is an essential part of multivariate calibration. It mainly aims to remove artefacts caused during measurement to improve prediction performance or interpretation. However, preprocessing can have undesired side-effects. Additionally, calibration algorithms can learn to deal with artefacts by themselves when enough samples are available. This may influence the effect preprocessing has on prediction performance when the calibration dataset size increases. In this paper we investigate the interaction between the size of the calibration data and preprocessing for NIR calibrations for several datasets. Results show that extending the calibration data with more samples improves prediction performance, regardless of the preprocessing strategy. Although prediction performance almost always benefits from preprocessing, extending the calibration data can reduce the effect of preprocessing on prediction performance. This means the optimal preprocessing strategy may change as a function of the number of samples. It is demonstrated that using a Design of Experiments (DoE) approach to determine the optimal preprocessing strategy leads to equal or better prediction performance for all calibration set sizes compared to the case of not preprocessing at all. Preprocessing is most valuable for small calibration sets, but as the calibration set increases can become obsolete or even harmful. Therefore, we recommend to always evaluate the effect of a preprocessing strategy before making or updating calibration models.
Originele taal-2Engels
Artikelnummer104105
Aantal pagina's8
TijdschriftChemometrics and Intelligent Laboratory Systems
Volume204
DOI's
StatusGepubliceerd - 15 sep. 2020
Extern gepubliceerdJa

Vingerafdruk

Duik in de onderzoeksthema's van 'Investigating the need for preprocessing of near-infrared spectroscopic data as a function of sample size'. Samen vormen ze een unieke vingerafdruk.

Citeer dit