Sparse least trimmed squares regression for analyzing high-dimensional large data sets

A. Alfons, C. Croux, S.E.C. Gelper

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

151 Citaten (Scopus)
120 Downloads (Pure)

Samenvatting

Sparse model estimation is a topic of high importance in modern data analysis due to the increasing availability of data sets with a large number of variables. Another common problem in applied statistics is the presence of outliers in the data. This paper combines robust regression and sparse model estimation. A robust and sparse estimator is introduced by adding an L 1 penalty on the coefficient estimates to the well-known least trimmed squares (LTS) estimator. The breakdown point of this sparse LTS estimator is derived, and a fast algorithm for its computation is proposed. In addition, the sparse LTS is applied to protein and gene expression data of the NCI-60 cancer cell panel. Both a simulation study and the real data application show that the sparse LTS has better prediction performance than its competitors in the presence of leverage points.
Originele taal-2Engels
Pagina's (van-tot)226-248
Aantal pagina's23
TijdschriftThe Annals of Applied Statistics
Volume7
Nummer van het tijdschrift1
DOI's
StatusGepubliceerd - 2013
Extern gepubliceerdJa

Vingerafdruk

Duik in de onderzoeksthema's van 'Sparse least trimmed squares regression for analyzing high-dimensional large data sets'. Samen vormen ze een unieke vingerafdruk.

Citeer dit