Column generation based math-heuristic for classification trees

M. Firat (Corresponding author), Guillaume Crognier, Adriana Gabor, C.A.J. Hurkens, Yingqian Zhang

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

Uittreksel

This paper explores the use of Column Generation (CG)
techniques in constructing univariate binary decision trees for
classification tasks. We propose a novel Integer Linear Programming (ILP)
formulation, based on root-to-leaf paths in decision trees. The model is
solved via a Column Generation based heuristic. To speed up the
heuristic, we use a restricted instance data by considering a subset of
decision splits, sampled from the solutions of the well-known CART
algorithm. Extensive numerical experiments show that our approach is
competitive with the state-of-the-art ILP-based algorithms. In
particular, the proposed approach is capable of handling big data sets
with tens of thousands of data rows. Moreover, for large data sets, it
finds solutions competitive to CART.
TaalEngels
Artikelnummer1810.06684v1
Aantal pagina's29
TijdschriftComputers & Operations Research
StatusIngediend - 6 jul 2019

Vingerafdruk

Classification Tree
Column Generation
Decision trees
Linear programming
Integer Linear Programming
Heuristics
Decision tree
Large Data Sets
Univariate
Leaves
Speedup
Numerical Experiment
Roots
Binary
Path
Subset
Formulation
Experiments
Column generation
Integer linear programming

Trefwoorden

    Citeer dit

    Firat, M., Crognier, G., Gabor, A., Hurkens, C. A. J., & Zhang, Y. (2019). Column generation based math-heuristic for classification trees. Manuscript ingediend voor publicatie.
    @article{cdbdfed1b2914d308d1150f78ee22f24,
    title = "Column generation based math-heuristic for classification trees",
    abstract = "This paper explores the use of Column Generation (CG)techniques in constructing univariate binary decision trees forclassification tasks. We propose a novel Integer Linear Programming (ILP)formulation, based on root-to-leaf paths in decision trees. The model issolved via a Column Generation based heuristic. To speed up theheuristic, we use a restricted instance data by considering a subset ofdecision splits, sampled from the solutions of the well-known CARTalgorithm. Extensive numerical experiments show that our approach iscompetitive with the state-of-the-art ILP-based algorithms. Inparticular, the proposed approach is capable of handling big data setswith tens of thousands of data rows. Moreover, for large data sets, itfinds solutions competitive to CART.",
    keywords = "Machine Learning, Decision Trees, Column Generation, Classification, CART, integer linear programming",
    author = "M. Firat and Guillaume Crognier and Adriana Gabor and C.A.J. Hurkens and Yingqian Zhang",
    year = "2019",
    month = "7",
    day = "6",
    language = "English",
    journal = "Computers & Operations Research",
    issn = "0305-0548",
    publisher = "Elsevier",

    }

    Column generation based math-heuristic for classification trees. / Firat, M. (Corresponding author); Crognier, Guillaume; Gabor, Adriana; Hurkens, C.A.J.; Zhang, Yingqian.

    In: Computers & Operations Research, 06.07.2019.

    Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

    TY - JOUR

    T1 - Column generation based math-heuristic for classification trees

    AU - Firat,M.

    AU - Crognier,Guillaume

    AU - Gabor,Adriana

    AU - Hurkens,C.A.J.

    AU - Zhang,Yingqian

    PY - 2019/7/6

    Y1 - 2019/7/6

    N2 - This paper explores the use of Column Generation (CG)techniques in constructing univariate binary decision trees forclassification tasks. We propose a novel Integer Linear Programming (ILP)formulation, based on root-to-leaf paths in decision trees. The model issolved via a Column Generation based heuristic. To speed up theheuristic, we use a restricted instance data by considering a subset ofdecision splits, sampled from the solutions of the well-known CARTalgorithm. Extensive numerical experiments show that our approach iscompetitive with the state-of-the-art ILP-based algorithms. Inparticular, the proposed approach is capable of handling big data setswith tens of thousands of data rows. Moreover, for large data sets, itfinds solutions competitive to CART.

    AB - This paper explores the use of Column Generation (CG)techniques in constructing univariate binary decision trees forclassification tasks. We propose a novel Integer Linear Programming (ILP)formulation, based on root-to-leaf paths in decision trees. The model issolved via a Column Generation based heuristic. To speed up theheuristic, we use a restricted instance data by considering a subset ofdecision splits, sampled from the solutions of the well-known CARTalgorithm. Extensive numerical experiments show that our approach iscompetitive with the state-of-the-art ILP-based algorithms. Inparticular, the proposed approach is capable of handling big data setswith tens of thousands of data rows. Moreover, for large data sets, itfinds solutions competitive to CART.

    KW - Machine Learning

    KW - Decision Trees

    KW - Column Generation

    KW - Classification

    KW - CART

    KW - integer linear programming

    UR - http://arxiv.org/abs/1810.06684

    M3 - Article

    JO - Computers & Operations Research

    T2 - Computers & Operations Research

    JF - Computers & Operations Research

    SN - 0305-0548

    M1 - 1810.06684v1

    ER -