BoostEMM : Transparent boosting using exceptional model mining

S.B. van der Zon, O. Zeev Ben Mordehay, T.S. Vrijdag, W. van Ipenburg, J. Veldsink, W. Duivesteijn, M. Pechenizkiy

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

18 Downloads (Pure)

Abstract

Boosting is an iterative ensemble-learning paradigm. Every iteration, a weak predictor learns a classification task, taking into account performance achieved in previous iterations. This is done by assigning weights to individual records of the dataset, which are increased if the record is misclassified by the previous weak predictor. Hence, subsequent predictors learn to focus on problematic records in the dataset. Boosting ensembles such as AdaBoost have shown to be effective models at fighting both high variance and high bias, even in challenging situations such as class imbalance. However, some aspects of AdaBoost might imply limitations for its deployment in the real world. On the one hand, focusing on problematic records can lead to overfitting in the presence of random noise. On the other hand, learning a boosting ensemble that assigns higher weights to hard-to-classify people might throw up serious questions in the age of responsible and transparent data analytics; if a bank must tell a customer that they are denied a loan, because the underlying algorithm made a decision specifically focusing the customer since they are hard to classify, this could be legally dubious. To kill these two birds with one stone, we introduce BoostEMM: a variant of AdaBoost where in every iteration of the procedure, rather than boosting problematic records, we boost problematic subgroups as found through Exceptional Model Mining. Boosted records being part of a coherent group should prevent overfitting, and explicit definitions of the subgroups of people being boosted enhances the transparency of the algorithm.

Original languageEnglish
Title of host publicationProceedings of the Second Workshop on MIning DAta for financial applicationS (MIDAS 2017), 18 September 2017, Skopje, Macedonia
EditorsI. Bordino, G. Caldarelli, F. Fumarola, F. Gullo, T. Squartini
Pages5-16
Number of pages12
Publication statusPublished - 2017
EventSecond Workshop on MIning DAta for financial applicationS (MIDAS 2017), Skopje, Macedonia, September 18, 2017 - Skopje, Macedonia, The Former Yugoslav Republic of
Duration: 18 Sep 2017 → …
Conference number: 2nd
http://ceur-ws.org/Vol-1941/

Publication series

NameCEUR Workshop Proceedings
PublisherCEUR-WS.org
Volume1941
ISSN (Print)1613-0073

Conference

ConferenceSecond Workshop on MIning DAta for financial applicationS (MIDAS 2017), Skopje, Macedonia, September 18, 2017
Abbreviated titleMIDAS 2017
CountryMacedonia, The Former Yugoslav Republic of
CitySkopje
Period18/09/17 → …
Internet address

Keywords

  • Boosting
  • Class imbalance
  • Exceptional Model Mining
  • Model transparency
  • Responsible analytics

Fingerprint Dive into the research topics of 'BoostEMM : Transparent boosting using exceptional model mining'. Together they form a unique fingerprint.

  • Cite this

    van der Zon, S. B., Zeev Ben Mordehay, O., Vrijdag, T. S., van Ipenburg, W., Veldsink, J., Duivesteijn, W., & Pechenizkiy, M. (2017). BoostEMM : Transparent boosting using exceptional model mining. In I. Bordino, G. Caldarelli, F. Fumarola, F. Gullo, & T. Squartini (Eds.), Proceedings of the Second Workshop on MIning DAta for financial applicationS (MIDAS 2017), 18 September 2017, Skopje, Macedonia (pp. 5-16). (CEUR Workshop Proceedings; Vol. 1941).