Abstract
I consider a binary classification problem with a feature vector of high dimensionality. Spam mail filters are a popular example hereof. A Bayesian approach requires us to estimate the probability of a feature vector given the class of the object. Due to the size of the feature vector this is an unfeasible t ask. A useful approach is to split the feature space into several (conditionally) independent subspaces. This results in a new problem, namely how to find the " best" subdivision. In this paper I consider a weighing approach that will perform (asymptotically) as good as the best subdivision and still has a manageable complexity
Original language | English |
---|---|
Title of host publication | Proceedings of the 29th Symposium on Information Theory in the Benelux, May 29-30, 2008, Leuven, Belgium |
Editors | L. Perre, Van der, A. Dejonghe, V. Ramon |
Place of Publication | Leuven |
Publisher | IMEC |
Pages | 121-128 |
ISBN (Print) | 978-90-9023135-8 |
Publication status | Published - 2008 |