Column generation based heuristic for learning classification trees

Murat Firat (Corresponding author), Guillaume Crognier, Adriana F. Gabor, C.A.J. Hurkens, Yingqian Zhang

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

47 Downloads (Pure)

Uittreksel

This paper explores the use of Column Generation (CG) techniques in constructing univariate binary decision trees for classification tasks. We propose a novel Integer Linear Programming (ILP) formulation, based on root-to-leaf paths in decision trees. The model is solved via a Column Generation based heuristic. To speed up the heuristic, we use a restricted instance data by considering a subset of decision splits, sampled from the solutions of the well-known CART algorithm. Extensive numerical experiments show that our approach is competitive with the state-of-the-art ILP-based algorithms. In particular, the proposed approach is capable of handling big data sets with tens of thousands of data rows. Moreover, for large data sets, it finds solutions competitive to CART.

Originele taal-2Engels
Artikelnummer104866
Aantal pagina's11
TijdschriftComputers & Operations Research
Volume116
DOI's
StatusGepubliceerd - apr 2020

Vingerafdruk

Classification Tree
Column Generation
Decision trees
Linear programming
Integer Linear Programming
Heuristics
Decision tree
Large Data Sets
Univariate
Leaves
Speedup
Numerical Experiment
Roots
Binary
Path
Subset
Formulation
Experiments
Learning
Column generation

Citeer dit

@article{cdbdfed1b2914d308d1150f78ee22f24,
title = "Column generation based heuristic for learning classification trees",
abstract = "This paper explores the use of Column Generation (CG) techniques in constructing univariate binary decision trees for classification tasks. We propose a novel Integer Linear Programming (ILP) formulation, based on root-to-leaf paths in decision trees. The model is solved via a Column Generation based heuristic. To speed up the heuristic, we use a restricted instance data by considering a subset of decision splits, sampled from the solutions of the well-known CART algorithm. Extensive numerical experiments show that our approach is competitive with the state-of-the-art ILP-based algorithms. In particular, the proposed approach is capable of handling big data sets with tens of thousands of data rows. Moreover, for large data sets, it finds solutions competitive to CART.",
keywords = "CART, Classification, Column generation, Decision trees, Integer linear programming, Machine learning",
author = "Murat Firat and Guillaume Crognier and Gabor, {Adriana F.} and C.A.J. Hurkens and Yingqian Zhang",
year = "2020",
month = "4",
doi = "10.1016/j.cor.2019.104866",
language = "English",
volume = "116",
journal = "Computers & Operations Research",
issn = "0305-0548",
publisher = "Elsevier",

}

Column generation based heuristic for learning classification trees. / Firat, Murat (Corresponding author); Crognier, Guillaume; Gabor, Adriana F.; Hurkens, C.A.J.; Zhang, Yingqian.

In: Computers & Operations Research, Vol. 116, 104866, 04.2020.

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

TY - JOUR

T1 - Column generation based heuristic for learning classification trees

AU - Firat, Murat

AU - Crognier, Guillaume

AU - Gabor, Adriana F.

AU - Hurkens, C.A.J.

AU - Zhang, Yingqian

PY - 2020/4

Y1 - 2020/4

N2 - This paper explores the use of Column Generation (CG) techniques in constructing univariate binary decision trees for classification tasks. We propose a novel Integer Linear Programming (ILP) formulation, based on root-to-leaf paths in decision trees. The model is solved via a Column Generation based heuristic. To speed up the heuristic, we use a restricted instance data by considering a subset of decision splits, sampled from the solutions of the well-known CART algorithm. Extensive numerical experiments show that our approach is competitive with the state-of-the-art ILP-based algorithms. In particular, the proposed approach is capable of handling big data sets with tens of thousands of data rows. Moreover, for large data sets, it finds solutions competitive to CART.

AB - This paper explores the use of Column Generation (CG) techniques in constructing univariate binary decision trees for classification tasks. We propose a novel Integer Linear Programming (ILP) formulation, based on root-to-leaf paths in decision trees. The model is solved via a Column Generation based heuristic. To speed up the heuristic, we use a restricted instance data by considering a subset of decision splits, sampled from the solutions of the well-known CART algorithm. Extensive numerical experiments show that our approach is competitive with the state-of-the-art ILP-based algorithms. In particular, the proposed approach is capable of handling big data sets with tens of thousands of data rows. Moreover, for large data sets, it finds solutions competitive to CART.

KW - CART

KW - Classification

KW - Column generation

KW - Decision trees

KW - Integer linear programming

KW - Machine learning

UR - http://www.scopus.com/inward/record.url?scp=85076856329&partnerID=8YFLogxK

U2 - 10.1016/j.cor.2019.104866

DO - 10.1016/j.cor.2019.104866

M3 - Article

AN - SCOPUS:85076856329

VL - 116

JO - Computers & Operations Research

JF - Computers & Operations Research

SN - 0305-0548

M1 - 104866

ER -