Expressive power of an algebra for data mining

T. Calders, L.V.S. Lakshmanan, R.T. Ng, J. Paredaens

    Research output: Contribution to journalArticleAcademicpeer-review

    20 Citations (Scopus)

    Abstract

    The relational data model has simple and clear foundations on which significant theoretical and systems research has flourished. By contrast, most research on data mining has focused on algorithmic issues. A major open question is: what's an appropriate foundation for data mining, which can accommodate disparate mining tasks? We address this problem by presenting a database model and an algebra for data mining. The database model is based on the 3W-model introduced by Johnson et al. [2000]. This model relied on black box mining operators. A main contribution of this article is to open up these black boxes, by using generic operators in a data mining algebra. Two key operators in this algebra are regionize, which creates regions (or models) from data tuples, and a restricted form of looping called mining loop. Then the resulting data mining algebra MA is studied and properties concerning expressive power and complexity are established. We present results in three directions: (1) expressiveness of the mining algebra; (2) relations with alternative frameworks, and (3) interactions between regionize and mining loop.
    Original languageEnglish
    Pages (from-to)1169-1214
    JournalACM Transactions on Database Systems
    Volume31
    Issue number4
    DOIs
    Publication statusPublished - 2006

    Fingerprint

    Dive into the research topics of 'Expressive power of an algebra for data mining'. Together they form a unique fingerprint.

    Cite this