BoostEMM : Transparent boosting using exceptional model mining

S.B. van der Zon, O. Zeev Ben Mordehay, T.S. Vrijdag, W. van Ipenburg, J. Veldsink, W. Duivesteijn, M. Pechenizkiy

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

8 Downloads (Pure)

Uittreksel

Boosting is an iterative ensemble-learning paradigm. Every iteration, a weak predictor learns a classification task, taking into account performance achieved in previous iterations. This is done by assigning weights to individual records of the dataset, which are increased if the record is misclassified by the previous weak predictor. Hence, subsequent predictors learn to focus on problematic records in the dataset. Boosting ensembles such as AdaBoost have shown to be effective models at fighting both high variance and high bias, even in challenging situations such as class imbalance. However, some aspects of AdaBoost might imply limitations for its deployment in the real world. On the one hand, focusing on problematic records can lead to overfitting in the presence of random noise. On the other hand, learning a boosting ensemble that assigns higher weights to hard-to-classify people might throw up serious questions in the age of responsible and transparent data analytics; if a bank must tell a customer that they are denied a loan, because the underlying algorithm made a decision specifically focusing the customer since they are hard to classify, this could be legally dubious. To kill these two birds with one stone, we introduce BoostEMM: a variant of AdaBoost where in every iteration of the procedure, rather than boosting problematic records, we boost problematic subgroups as found through Exceptional Model Mining. Boosted records being part of a coherent group should prevent overfitting, and explicit definitions of the subgroups of people being boosted enhances the transparency of the algorithm.

Originele taal-2Engels
TitelProceedings of the Second Workshop on MIning DAta for financial applicationS (MIDAS 2017), 18 September 2017, Skopje, Macedonia
RedacteurenI. Bordino, G. Caldarelli, F. Fumarola, F. Gullo, T. Squartini
Pagina's5-16
Aantal pagina's12
StatusGepubliceerd - 2017
EvenementSecond Workshop on MIning DAta for financial applicationS (MIDAS 2017), Skopje, Macedonia, September 18, 2017 - Skopje, Macedonië
Duur: 18 sep 2017 → …
Congresnummer: 2nd
http://ceur-ws.org/Vol-1941/

Publicatie series

NaamCEUR Workshop Proceedings
UitgeverijCEUR-WS.org
Volume1941
ISSN van geprinte versie1613-0073

Congres

CongresSecond Workshop on MIning DAta for financial applicationS (MIDAS 2017), Skopje, Macedonia, September 18, 2017
Verkorte titelMIDAS 2017
LandMacedonië
StadSkopje
Periode18/09/17 → …
Internet adres

Vingerafdruk

Adaptive boosting
Transparency

Citeer dit

van der Zon, S. B., Zeev Ben Mordehay, O., Vrijdag, T. S., van Ipenburg, W., Veldsink, J., Duivesteijn, W., & Pechenizkiy, M. (2017). BoostEMM : Transparent boosting using exceptional model mining. In I. Bordino, G. Caldarelli, F. Fumarola, F. Gullo, & T. Squartini (editors), Proceedings of the Second Workshop on MIning DAta for financial applicationS (MIDAS 2017), 18 September 2017, Skopje, Macedonia (blz. 5-16). (CEUR Workshop Proceedings; Vol. 1941).
van der Zon, S.B. ; Zeev Ben Mordehay, O. ; Vrijdag, T.S. ; van Ipenburg, W. ; Veldsink, J. ; Duivesteijn, W. ; Pechenizkiy, M. / BoostEMM : Transparent boosting using exceptional model mining. Proceedings of the Second Workshop on MIning DAta for financial applicationS (MIDAS 2017), 18 September 2017, Skopje, Macedonia . redacteur / I. Bordino ; G. Caldarelli ; F. Fumarola ; F. Gullo ; T. Squartini. 2017. blz. 5-16 (CEUR Workshop Proceedings).
@inproceedings{ad232d69ed504be998c2fac7a6561560,
title = "BoostEMM : Transparent boosting using exceptional model mining",
abstract = "Boosting is an iterative ensemble-learning paradigm. Every iteration, a weak predictor learns a classification task, taking into account performance achieved in previous iterations. This is done by assigning weights to individual records of the dataset, which are increased if the record is misclassified by the previous weak predictor. Hence, subsequent predictors learn to focus on problematic records in the dataset. Boosting ensembles such as AdaBoost have shown to be effective models at fighting both high variance and high bias, even in challenging situations such as class imbalance. However, some aspects of AdaBoost might imply limitations for its deployment in the real world. On the one hand, focusing on problematic records can lead to overfitting in the presence of random noise. On the other hand, learning a boosting ensemble that assigns higher weights to hard-to-classify people might throw up serious questions in the age of responsible and transparent data analytics; if a bank must tell a customer that they are denied a loan, because the underlying algorithm made a decision specifically focusing the customer since they are hard to classify, this could be legally dubious. To kill these two birds with one stone, we introduce BoostEMM: a variant of AdaBoost where in every iteration of the procedure, rather than boosting problematic records, we boost problematic subgroups as found through Exceptional Model Mining. Boosted records being part of a coherent group should prevent overfitting, and explicit definitions of the subgroups of people being boosted enhances the transparency of the algorithm.",
keywords = "Boosting, Class imbalance, Exceptional Model Mining, Model transparency, Responsible analytics",
author = "{van der Zon}, S.B. and {Zeev Ben Mordehay}, O. and T.S. Vrijdag and {van Ipenburg}, W. and J. Veldsink and W. Duivesteijn and M. Pechenizkiy",
year = "2017",
language = "English",
series = "CEUR Workshop Proceedings",
publisher = "CEUR-WS.org",
pages = "5--16",
editor = "I. Bordino and G. Caldarelli and F. Fumarola and F. Gullo and T. Squartini",
booktitle = "Proceedings of the Second Workshop on MIning DAta for financial applicationS (MIDAS 2017), 18 September 2017, Skopje, Macedonia",

}

van der Zon, SB, Zeev Ben Mordehay, O, Vrijdag, TS, van Ipenburg, W, Veldsink, J, Duivesteijn, W & Pechenizkiy, M 2017, BoostEMM : Transparent boosting using exceptional model mining. in I Bordino, G Caldarelli, F Fumarola, F Gullo & T Squartini (redactie), Proceedings of the Second Workshop on MIning DAta for financial applicationS (MIDAS 2017), 18 September 2017, Skopje, Macedonia . CEUR Workshop Proceedings, vol. 1941, blz. 5-16, Skopje, Macedonië, 18/09/17.

BoostEMM : Transparent boosting using exceptional model mining. / van der Zon, S.B.; Zeev Ben Mordehay, O.; Vrijdag, T.S.; van Ipenburg, W.; Veldsink, J.; Duivesteijn, W.; Pechenizkiy, M.

Proceedings of the Second Workshop on MIning DAta for financial applicationS (MIDAS 2017), 18 September 2017, Skopje, Macedonia . redactie / I. Bordino; G. Caldarelli; F. Fumarola; F. Gullo; T. Squartini. 2017. blz. 5-16 (CEUR Workshop Proceedings; Vol. 1941).

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

TY - GEN

T1 - BoostEMM : Transparent boosting using exceptional model mining

AU - van der Zon, S.B.

AU - Zeev Ben Mordehay, O.

AU - Vrijdag, T.S.

AU - van Ipenburg, W.

AU - Veldsink, J.

AU - Duivesteijn, W.

AU - Pechenizkiy, M.

PY - 2017

Y1 - 2017

N2 - Boosting is an iterative ensemble-learning paradigm. Every iteration, a weak predictor learns a classification task, taking into account performance achieved in previous iterations. This is done by assigning weights to individual records of the dataset, which are increased if the record is misclassified by the previous weak predictor. Hence, subsequent predictors learn to focus on problematic records in the dataset. Boosting ensembles such as AdaBoost have shown to be effective models at fighting both high variance and high bias, even in challenging situations such as class imbalance. However, some aspects of AdaBoost might imply limitations for its deployment in the real world. On the one hand, focusing on problematic records can lead to overfitting in the presence of random noise. On the other hand, learning a boosting ensemble that assigns higher weights to hard-to-classify people might throw up serious questions in the age of responsible and transparent data analytics; if a bank must tell a customer that they are denied a loan, because the underlying algorithm made a decision specifically focusing the customer since they are hard to classify, this could be legally dubious. To kill these two birds with one stone, we introduce BoostEMM: a variant of AdaBoost where in every iteration of the procedure, rather than boosting problematic records, we boost problematic subgroups as found through Exceptional Model Mining. Boosted records being part of a coherent group should prevent overfitting, and explicit definitions of the subgroups of people being boosted enhances the transparency of the algorithm.

AB - Boosting is an iterative ensemble-learning paradigm. Every iteration, a weak predictor learns a classification task, taking into account performance achieved in previous iterations. This is done by assigning weights to individual records of the dataset, which are increased if the record is misclassified by the previous weak predictor. Hence, subsequent predictors learn to focus on problematic records in the dataset. Boosting ensembles such as AdaBoost have shown to be effective models at fighting both high variance and high bias, even in challenging situations such as class imbalance. However, some aspects of AdaBoost might imply limitations for its deployment in the real world. On the one hand, focusing on problematic records can lead to overfitting in the presence of random noise. On the other hand, learning a boosting ensemble that assigns higher weights to hard-to-classify people might throw up serious questions in the age of responsible and transparent data analytics; if a bank must tell a customer that they are denied a loan, because the underlying algorithm made a decision specifically focusing the customer since they are hard to classify, this could be legally dubious. To kill these two birds with one stone, we introduce BoostEMM: a variant of AdaBoost where in every iteration of the procedure, rather than boosting problematic records, we boost problematic subgroups as found through Exceptional Model Mining. Boosted records being part of a coherent group should prevent overfitting, and explicit definitions of the subgroups of people being boosted enhances the transparency of the algorithm.

KW - Boosting

KW - Class imbalance

KW - Exceptional Model Mining

KW - Model transparency

KW - Responsible analytics

UR - http://www.scopus.com/inward/record.url?scp=85032436537&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85032436537

T3 - CEUR Workshop Proceedings

SP - 5

EP - 16

BT - Proceedings of the Second Workshop on MIning DAta for financial applicationS (MIDAS 2017), 18 September 2017, Skopje, Macedonia

A2 - Bordino, I.

A2 - Caldarelli, G.

A2 - Fumarola, F.

A2 - Gullo, F.

A2 - Squartini, T.

ER -

van der Zon SB, Zeev Ben Mordehay O, Vrijdag TS, van Ipenburg W, Veldsink J, Duivesteijn W et al. BoostEMM : Transparent boosting using exceptional model mining. In Bordino I, Caldarelli G, Fumarola F, Gullo F, Squartini T, redacteurs, Proceedings of the Second Workshop on MIning DAta for financial applicationS (MIDAS 2017), 18 September 2017, Skopje, Macedonia . 2017. blz. 5-16. (CEUR Workshop Proceedings).