The random forest (RF) technique is used among the best performing multi-class classifiers, popular in different machine learning applications. They are known for high computational efficiency during training and testing, while delivering highly accurate results. However, conventionally, RF is trained in an off-line mode, where it requires the entire training set to be available beforehand. This imposes practical limitations, such as compiling training data in advance and disregard any further changes in the data distribution, even when the data is sequential. In this paper, we investigate the incremental learning behavior RF algorithm. We generate an initial RF based on a limited training data, and update the RF incrementally with the arrival of the new data. We have developed three incremental learning strategies with the RF, based on the selection criteria of the trees for an update, namely all update, random update and performance-based update. We have tested our methods in different publicly available multi-class static streaming data sets. The results show that the performance-based update of RF results in a classification accuracy comparable to an off-line RF, while requiring a significantly lower computational cost.
|Title of host publication
|Proceedings of the 32nd WIC Symposium on Information Theory in the Benelux, 10-11 may 2011, Brussels, Belgium
|Place of Publication
|Werkgemeenschap voor Informatie- en Communicatietheorie (WIC)
|Published - 2011