Handling changes over time in supervised learning (concept drift) lately has received a great deal of attention, a number of adaptive learning strategies have been developed. Most of them make an optimistic assumption that the new labels become available immediately. In real sequential classification tasks it is often unrealistic due to task specific delayed labeling or associated labeling costs. We address the problem of change detectability, given, that the new labels are not available. In this analytical study we look at the space of changes from probabilistic perspective to analyze, what changes are detectable without seeing the labels and what are not. We conduct a range of experiments with real life data with simulated and natural changes to explore this detectability issue. We propose a computationally friendly detection technique, which monitors a stream of classifier outputs. We demonstrate analytically and experimentally, what types of changes are possible to detect when the labels for the new data are not available.
|Title of host publication||Proceedings of the 10th IEEE International Conference on Data Mining (ICDM, Sydney, Australia, December 14-17, 2010)|
|Publisher||Institute of Electrical and Electronics Engineers|
|Publication status||Published - 2010|