Abstract
The random forest (RF) technique is used among the best performing multi-class
classifiers, popular in different machine learning applications. They are known for
high computational efficiency during training and testing, while delivering highly
accurate results. However, conventionally, RF is trained in an off-line mode,
where it requires the entire training set to be available beforehand. This imposes
practical limitations, such as compiling training data in advance and disregard
any further changes in the data distribution, even when the data is sequential.
In this paper, we investigate the incremental learning behavior RF algorithm.
We generate an initial RF based on a limited training data, and update the
RF incrementally with the arrival of the new data. We have developed three
incremental learning strategies with the RF, based on the selection criteria of the
trees for an update, namely all update, random update and performance-based
update. We have tested our methods in different publicly available multi-class
static streaming data sets. The results show that the performance-based update
of RF results in a classification accuracy comparable to an off-line RF, while
requiring a significantly lower computational cost.
Original language | English |
---|---|
Title of host publication | Proceedings of the 32nd WIC Symposium on Information Theory in the Benelux, 10-11 may 2011, Brussels, Belgium |
Place of Publication | Delft |
Publisher | Werkgemeenschap voor Informatie- en Communicatietheorie (WIC) |
Pages | 1-6 |
Publication status | Published - 2011 |