Feature selection for unbiased imputation of missing values: A case study in healthcare

Chetanya Puri, Gerben Kooijman, Xi Long, Paul C. Hamelmann, Asvadi Sima, Bart Vanrumste, Stijn Luca

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

Samenvatting

Datasets in healthcare are plagued with incomplete information. Imputation is a common method to deal with missing data where the basic idea is to substitute some reasonable guess for each missing value and then continue with the analysis as if there were no missing data. However unbiased predictions based on imputed datasets can only be guaranteed when the missing mechanism is completely independent of the observed or missing data. Often, this promise is broken in healthcare dataset acquisition due to unintentional errors or response bias of the interviewees. We highlight this issue by studying extensively on an annual health survey dataset on infant mortality prediction and provide a systematic testing for such assumption. We identify such biased features using an empirical approach and show the impact of wrongful inclusion of these features on the predictive performance.Clinical relevance— We show that blind analysis along with plug and play imputation of healthcare data is a potential pitfall that clinicians and researchers want to avoid in finding important markers of disease.
Originele taal-2Engels
Titel2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)
UitgeverijInstitute of Electrical and Electronics Engineers
Pagina's1911-1915
Aantal pagina's5
ISBN van elektronische versie978-1-7281-1179-7
DOI's
StatusGepubliceerd - 9 dec. 2021
Extern gepubliceerdJa
Evenement43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2021
- Virtual, Mexico
Duur: 1 nov. 20215 nov. 2021
Congresnummer: 43
https://embc.embs.org/2021/

Congres

Congres43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2021
Verkorte titelEMBC 2021
Land/RegioMexico
Periode1/11/215/11/21
Internet adres

Financiering

This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 766139.

FinanciersFinanciernummer
H2020 Marie Skłodowska-Curie Actions766139
Horizon 2020

    Vingerafdruk

    Duik in de onderzoeksthema's van 'Feature selection for unbiased imputation of missing values: A case study in healthcare'. Samen vormen ze een unieke vingerafdruk.

    Citeer dit