We propose a novel collaborative approach for document classification, combining the knowledge of multiple users for improved organization of data such as individual document repositories or emails. To this end, we distribute locally built classification models in a network of participating users, and combine the shared classifiers into more powerful meta models. In order to increase the propagation efficiency, we apply a method for selecting the most discriminative model components and transmitting them to other participants. In our experiments on four large standard collections for text classification we study the resulting tradeoffs between network cost and classification accuracy. The experimental results show that the proposed model propagation has negligible communication costs and substantially outperforms current approaches with respect to efficiency and classification quality.
- Clustering, classification and association rules
- Data mining