TY - JOUR
T1 - Reshaped Sequential Replacement algorithm
T2 - An efficient approach to variable selection
AU - Cassotti, Matteo
AU - Grisoni, Francesca
AU - Todeschini, Roberto
PY - 2014/4/15
Y1 - 2014/4/15
N2 - A modified version of the Sequential Replacement (SR) algorithm for variable selection is proposed, featuring modern functionalities aimed to: 1) reduce the computational time; 2) estimate the real predictivity of the model; 3) identify models suffering from pathologies. This redesigned version was called Reshaped Sequential Replacement (RSR) algorithm.The RSR algorithm was applied to several datasets in regression and classification and was compared with the original SR method by means of a Design of Experiments (DoE). The DoE took into account the functions that affect the outcome of the search in terms of generated combinations of variables and time required for computation. The results were also compared with published models on the same datasets, taken as reference, and obtained by different variable selection methods.This latter comparison showed that the RSR algorithm managed to find good subsets of variables on all datasets, even though the reference models were not always found. When the reference model was not found the RSR algorithm returned comparable or better subsets of variables, evaluated in cross-validation. The DoE showed that the inclusion of the additional functions allowed to obtain models with equivalent or better performances in a decreased computational time compared to the original SR method.
AB - A modified version of the Sequential Replacement (SR) algorithm for variable selection is proposed, featuring modern functionalities aimed to: 1) reduce the computational time; 2) estimate the real predictivity of the model; 3) identify models suffering from pathologies. This redesigned version was called Reshaped Sequential Replacement (RSR) algorithm.The RSR algorithm was applied to several datasets in regression and classification and was compared with the original SR method by means of a Design of Experiments (DoE). The DoE took into account the functions that affect the outcome of the search in terms of generated combinations of variables and time required for computation. The results were also compared with published models on the same datasets, taken as reference, and obtained by different variable selection methods.This latter comparison showed that the RSR algorithm managed to find good subsets of variables on all datasets, even though the reference models were not always found. When the reference model was not found the RSR algorithm returned comparable or better subsets of variables, evaluated in cross-validation. The DoE showed that the inclusion of the additional functions allowed to obtain models with equivalent or better performances in a decreased computational time compared to the original SR method.
KW - Multivariate analysis
KW - QUIK rule
KW - Roulette wheel
KW - Sequential replacement
KW - Tabu list
KW - Variable selection
UR - http://www.scopus.com/inward/record.url?scp=84897428070&partnerID=8YFLogxK
U2 - 10.1016/j.chemolab.2014.01.011
DO - 10.1016/j.chemolab.2014.01.011
M3 - Article
AN - SCOPUS:84897428070
SN - 0169-7439
VL - 133
SP - 136
EP - 148
JO - Chemometrics and Intelligent Laboratory Systems
JF - Chemometrics and Intelligent Laboratory Systems
ER -