Samenvatting
Given a database of records, it might be possible to identify small subsets of data which distribution is exceptionally different from the distribution in the complete set of data records. Finding such interesting relationships, which we call exceptional relationships, in an automated way would allow discovering unusual or exceptional hidden behaviour. In this paper, we formulate the problem of mining exceptional relationships as a special case of exceptional model mining and propose a grammar-guided genetic programming algorithm (MERG3P) that enables the discovery of any exceptional relationships. In particular, MERG3P can work directly not only with categorical, but also with numerical data. In the experimental evaluation, we conduct a case study on mining exceptional relations between well-known and widely used quality measures of association rules, which exceptional behaviour would be of interest to pattern mining experts. For this purpose, we constructed a data set comprising a wide range of values for each considered association rule quality measure, such that possible exceptional relations between measures could be discovered. Thus, besides the actual validation of MERG3P, we found that the Support and Leverage measures in fact are negatively correlated under certain conditions, while in general experts in the field expect these measures to be positively correlated.
Keywords: Association rules; Exceptional subgroups; Genetic programming
Originele taal-2 | Engels |
---|---|
Pagina's (van-tot) | 571-594 |
Aantal pagina's | 24 |
Tijdschrift | Knowledge and Information Systems |
Volume | 47 |
Nummer van het tijdschrift | 3 |
DOI's | |
Status | Gepubliceerd - jun. 2016 |