A Probabilistic Framework for Adapting to Changing and Recurring Concepts in Data Streams.

Ben Halstead, Yun Sing Koh, Patricia Riddle, Mykola Pechenizkiy, Albert Bifet

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Citation (Scopus)

Abstract

The distribution of streaming data often changes over time as conditions change, a phenomenon known as concept drift. Only a subset of previous experience, collected in similar conditions, is relevant to learning an accurate classifier for current data. Learning from irrelevant experience describing a different concept can degrade performance. A system learning from streaming data must identify which recent experience is irrelevant when conditions change and which past experience is relevant when concepts reoccur, e.g., when weather events or financial patterns repeat. Existing streaming approaches either do not consider experience to change in relevance over time and thus cannot handle concept drift, or only consider the recency of experience and thus cannot handle recurring concepts, or only sparsely evaluate relevance and thus fail when concept drift is missed. To enable learning in changing conditions, we propose SELeCT, a probabilistic method for continuously evaluating the relevance of past experience. SELeCT maintains a distinct internal state for each concept, representing relevant experience with a unique classifier. We propose a Bayesian algorithm for estimating state relevance, combining the likelihood of drawing recent observations from a given state with a transition pattern prior based on the system's current state. The current state is continuously maintained using a Hoeffding bound based algorithm, which unlike existing methods, guarantees that every observation is classified using the state estimated as the most relevant, while also maintaining temporal stability. We find SELeCT is able to choose experience relevant to ground truth concepts with recall and precision above 0.9, significantly outperforming existing methods and close to a theoretical optimum, leading to significantly higher accuracy and enabling new opportunities for learning in complex changing conditions.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE 9th International Conference on Data Science and Advanced Analytics, DSAA 2022
EditorsJoshua Zhexue Huang, Yi Pan, Barbara Hammer, Muhammad Khurram Khan, Xing Xie, Laizhong Cui, Yulin He
Pages1-10
Number of pages10
ISBN (Electronic)9781665473309
DOIs
Publication statusPublished - 2022

Bibliographical note

DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.

Keywords

  • Data Streams
  • Recurring Concepts

Fingerprint

Dive into the research topics of 'A Probabilistic Framework for Adapting to Changing and Recurring Concepts in Data Streams.'. Together they form a unique fingerprint.
  • Best Research Paper Award of IEEE DSAA 2022

    Halstead, Ben (Recipient), Koh, Yun Sing (Recipient), Riddle, Patricia (Recipient), Pechenizkiy, Mykola (Recipient) & Bifet, Albert (Recipient), 2022

    Prize: OtherCareer, activity or publication related prizes (lifetime, best paper, poster etc.)Scientific

Cite this