The natural representation of XML data is to use the underlying tree structure of the data. When analyzing these trees we are ensured that no structural information is lost. These tree structures can be efficiently analyzed due to the existence of frequent pattern mining algorithms that works directly on tree structured data. In this work we describe a classification method for XML data based on frequent attribute trees. From these frequent patterns we select so called emerging patterns, and use these as binary features in a decision tree algorithm. The experimental results show that combining emerging attribute tree patterns with standard classification methods, is a promising combination to tackle the classification of XML documents.
|Title of host publication||Comparative Evaluation of XML Information Retrieval Systems (5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, Dagstuhl Castle, Germany, December 17-20, 2006, Revised and Selected Papers)|
|Editors||N. Fuhr, M. Lalmas, A. Trotman|
|Place of Publication||Berlin|
|Publication status||Published - 2006|
|Name||Lecture Notes in Computer Science|
Knijf, de, J. (2006). FAT-CAT : frequent attributes tree based classification. In N. Fuhr, M. Lalmas, & A. Trotman (Eds.), Comparative Evaluation of XML Information Retrieval Systems (5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, Dagstuhl Castle, Germany, December 17-20, 2006, Revised and Selected Papers) (pp. 485-496). (Lecture Notes in Computer Science; Vol. 4518). Springer. https://doi.org/10.1007/978-3-540-73888-6_45