On combining principal components with Fisher's linear discriminants for supervised learning

M. Pechenizkiy, A. Tsymbal, S. Puuronen

Research output: Contribution to journalArticleAcademicpeer-review


"The curse of dimensionality" is pertinent to many learning algorithms, and it denotes the drastic increase of computational complexity and classification error in high dimensions. In this paper, principal component analysis (PCA), parametric feature extraction (FE) based on Fisher’s linear discriminant analysis (LDA), and their combination as means of dimensionality reduction are analysed with respect to the performance of different classifiers. Three commonly used classifiers are taken for analysis: kNN, Naïve Bayes and C4.5 decision tree. Recently, it has been argued that it is extremely important to use class information in FE for supervised learning (SL). However, LDA-based FE, although using class information, has a serious shortcoming due to its parametric nature. Namely, the number of extracted components cannot be more that the number of classes minus one. Besides, as it can be concluded from its name, LDA works mostly for linearly separable classes only. In this paper we study if it is possible to overcome these shortcomings adding the most significant principal components to the set of features extracted with LDA. In experiments on 21 benchmark datasets from UCI repository these two approaches (PCA and LDA) are compared with each other, and with their combination, for each classifier. Our results demonstrate that such a combination approach has certain potential, especially when applied for C4.5 decision tree learning. However, from the practical point of view the combination approach cannot be recommended for Naïve Bayes since its behavior is very unstable on different datasets.
Original languageEnglish
Pages (from-to)59-73
JournalFoundations of Computing and Decision Sciences
Issue number1
Publication statusPublished - 2006


Dive into the research topics of 'On combining principal components with Fisher's linear discriminants for supervised learning'. Together they form a unique fingerprint.

Cite this