The online performance estimation framework: heterogeneous ensemble learning for data streams

J.N. van Rijn, G. Holmes, B. Pfahringer, J. Vanschoren

Research output: Contribution to journalArticleAcademicpeer-review

14 Citations (Scopus)
132 Downloads (Pure)

Abstract

Ensembles of classifiers are among the best performing classifiers available in many data mining applications, including the mining of data streams. Rather than training one classifier, multiple classifiers are trained, and their predictions are combined according to a given voting schedule. An important prerequisite for ensembles to be successful is that the individual models are diverse. One way to vastly increase the diversity among the models is to build an heterogeneous ensemble, comprised of fundamentally different model types. However, most ensembles developed specifically for the dynamic data stream setting rely on only one type of base-level classifier, most often Hoeffding Trees. We study the use of heterogeneous ensembles for data streams. We introduce the Online Performance Estimation framework, which dynamically weights the votes of individual classifiers in an ensemble. Using an internal evaluation on recent training data, it measures how well ensemble members performed on this and dynamically updates their weights. Experiments over a wide range of data streams show performance that is competitive with state of the art ensemble techniques, including Online Bagging and Leveraging Bagging, while being significantly faster. All experimental results from this work are easily reproducible and publicly available online.
Original languageEnglish
Pages (from-to)149–176
Number of pages28
JournalMachine Learning
Volume107
Issue number1
DOIs
Publication statusPublished - 8 Jan 2018

Fingerprint

Classifiers
Data mining
Experiments

Cite this

van Rijn, J.N. ; Holmes, G. ; Pfahringer, B. ; Vanschoren, J. / The online performance estimation framework: heterogeneous ensemble learning for data streams. In: Machine Learning. 2018 ; Vol. 107, No. 1. pp. 149–176.
@article{7337bd44870444bda605df43a6597dfc,
title = "The online performance estimation framework: heterogeneous ensemble learning for data streams",
abstract = "Ensembles of classifiers are among the best performing classifiers available in many data mining applications, including the mining of data streams. Rather than training one classifier, multiple classifiers are trained, and their predictions are combined according to a given voting schedule. An important prerequisite for ensembles to be successful is that the individual models are diverse. One way to vastly increase the diversity among the models is to build an heterogeneous ensemble, comprised of fundamentally different model types. However, most ensembles developed specifically for the dynamic data stream setting rely on only one type of base-level classifier, most often Hoeffding Trees. We study the use of heterogeneous ensembles for data streams. We introduce the Online Performance Estimation framework, which dynamically weights the votes of individual classifiers in an ensemble. Using an internal evaluation on recent training data, it measures how well ensemble members performed on this and dynamically updates their weights. Experiments over a wide range of data streams show performance that is competitive with state of the art ensemble techniques, including Online Bagging and Leveraging Bagging, while being significantly faster. All experimental results from this work are easily reproducible and publicly available online.",
author = "{van Rijn}, J.N. and G. Holmes and B. Pfahringer and J. Vanschoren",
year = "2018",
month = "1",
day = "8",
doi = "10.1007/s10994-017-5686-9",
language = "English",
volume = "107",
pages = "149–176",
journal = "Machine Learning",
issn = "0885-6125",
publisher = "Springer",
number = "1",

}

The online performance estimation framework: heterogeneous ensemble learning for data streams. / van Rijn, J.N.; Holmes, G.; Pfahringer, B.; Vanschoren, J.

In: Machine Learning, Vol. 107, No. 1, 08.01.2018, p. 149–176.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - The online performance estimation framework: heterogeneous ensemble learning for data streams

AU - van Rijn, J.N.

AU - Holmes, G.

AU - Pfahringer, B.

AU - Vanschoren, J.

PY - 2018/1/8

Y1 - 2018/1/8

N2 - Ensembles of classifiers are among the best performing classifiers available in many data mining applications, including the mining of data streams. Rather than training one classifier, multiple classifiers are trained, and their predictions are combined according to a given voting schedule. An important prerequisite for ensembles to be successful is that the individual models are diverse. One way to vastly increase the diversity among the models is to build an heterogeneous ensemble, comprised of fundamentally different model types. However, most ensembles developed specifically for the dynamic data stream setting rely on only one type of base-level classifier, most often Hoeffding Trees. We study the use of heterogeneous ensembles for data streams. We introduce the Online Performance Estimation framework, which dynamically weights the votes of individual classifiers in an ensemble. Using an internal evaluation on recent training data, it measures how well ensemble members performed on this and dynamically updates their weights. Experiments over a wide range of data streams show performance that is competitive with state of the art ensemble techniques, including Online Bagging and Leveraging Bagging, while being significantly faster. All experimental results from this work are easily reproducible and publicly available online.

AB - Ensembles of classifiers are among the best performing classifiers available in many data mining applications, including the mining of data streams. Rather than training one classifier, multiple classifiers are trained, and their predictions are combined according to a given voting schedule. An important prerequisite for ensembles to be successful is that the individual models are diverse. One way to vastly increase the diversity among the models is to build an heterogeneous ensemble, comprised of fundamentally different model types. However, most ensembles developed specifically for the dynamic data stream setting rely on only one type of base-level classifier, most often Hoeffding Trees. We study the use of heterogeneous ensembles for data streams. We introduce the Online Performance Estimation framework, which dynamically weights the votes of individual classifiers in an ensemble. Using an internal evaluation on recent training data, it measures how well ensemble members performed on this and dynamically updates their weights. Experiments over a wide range of data streams show performance that is competitive with state of the art ensemble techniques, including Online Bagging and Leveraging Bagging, while being significantly faster. All experimental results from this work are easily reproducible and publicly available online.

U2 - 10.1007/s10994-017-5686-9

DO - 10.1007/s10994-017-5686-9

M3 - Article

VL - 107

SP - 149

EP - 176

JO - Machine Learning

JF - Machine Learning

SN - 0885-6125

IS - 1

ER -