Racing trees to query partial data

Vu-Linh Nguyen, Sébastien Destercke (Corresponding author), Marie-Hélène Masson, Rashad Ghassani

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Dealing with partially known or missing data is a common problem in machine learning. This work is interested in the problem of querying the true value of data to improve the quality of the learned model, when those data are only partially known. This study is in the line of active learning, since we consider that the precise value of some partial data can be queried to reduce the uncertainty in the learning process, yet can consider any kind of partial data (not only entirely missing one). We propose a querying strategy based on the concept of racing algorithms in which several models are competing. The idea is to identify the query that will help the most to quickly decide the winning model in the competition. After discussing and formalizing the general ideas of our approach, we study the particular case of decision trees in case of interval-valued features and set-valued labels. The experimental results indicate that, in comparison with other baselines, the proposed approach significantly outperforms simpler strategies in the case of partially specified features, while it achieves similar performances in the case of partially specified labels.
Original languageEnglish
Pages (from-to)9285-9305
Number of pages21
JournalSoft Computing : a Fusion of Foundations, Methodologies and Applications
Volume25
Issue number14
DOIs
Publication statusPublished - Jul 2021
Externally publishedYes

Keywords

  • Active learning
  • Data querying
  • Decision trees
  • Partial data
  • Racing algorithms

Fingerprint

Dive into the research topics of 'Racing trees to query partial data'. Together they form a unique fingerprint.

Cite this