Abstract
Streaming sources of data are becoming more common as the ability to collect data in real-time grows. A major concern in dealing with data streams is concept drift, a change in the distribution of data over time, for example, due to changes in environmental conditions. Representing concepts (stationary periods featuring similar behaviour) is a key idea in adapting to concept drift. By testing the similarity of a concept representation to a window of observations, we can detect concept drift to a new or previously seen recurring concept. Concept representations are constructed using meta-information features, values describing aspects of concept behaviour. We find that previously proposed concept representations rely on small numbers of meta-information features. These representations often cannot distinguish concepts, leaving systems vulnerable to concept drift. We propose FiCSUM, a general framework to represent both supervised and unsupervised behaviours of a concept in a fingerprint, a vector of many distinct meta-information features able to uniquely identify more concepts. Our dynamic weighting strategy learns which meta-information features describe concept drift in a given dataset, allowing a diverse set of meta-information features to be used at once. FiCSUM outperforms state-of-the-art methods over a range of 11 real world and synthetic datasets in both accuracy and modeling underlying concept drift.
Original language | English |
---|---|
Title of host publication | Proceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021 |
Publisher | IEEE Computer Society |
Pages | 1056-1067 |
Number of pages | 12 |
ISBN (Electronic) | 9781728191843 |
DOIs | |
Publication status | Published - Apr 2021 |
Event | 37th IEEE International Conference on Data Engineering, ICDE 2021 - Virtual, Chania, Greece Duration: 19 Apr 2021 → 22 Apr 2021 |
Conference
Conference | 37th IEEE International Conference on Data Engineering, ICDE 2021 |
---|---|
Country/Territory | Greece |
City | Virtual, Chania |
Period | 19/04/21 → 22/04/21 |
Bibliographical note
Funding Information:The work was supported by the Marsden Fund Council from New Zealand Government funding (Project ID 18-UOA-005),
Funding Information:
The work was supported by the Marsden Fund Council from New Zealand Government funding (Project ID 18-UOA-005), managed by Royal Society Te Aparangi.
Publisher Copyright:
© 2021 IEEE.
Funding
The work was supported by the Marsden Fund Council from New Zealand Government funding (Project ID 18-UOA-005), The work was supported by the Marsden Fund Council from New Zealand Government funding (Project ID 18-UOA-005), managed by Royal Society Te Aparangi.
Keywords
- Concept Drift
- Data Stream
- Meta-Information
- Recurring Concepts