Supervised human-guided data exploration

Emilia Oikarinen, Kai Puolamäki, Samaneh Khoshrou, Mykola Pechenizkiy

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

An exploratory data analysis system should be aware of what a user already knows and what the user wants to know of the data. Otherwise it is impossible to provide the user with truly informative and useful views of the data. In our recently introduced framework for human-guided data exploration (Puolamäki et al. [20]), both the user’s knowledge and objectives are modelled as distributions over data, parametrised by tile constraints. This makes it possible to show the users the most informative views given their current knowledge and objectives. Often the data, however, comes with a class label and the user is interested only of the features informative related to the class. In non-interactive settings there exist dimensionality reduction methods, such as supervised PCA (Barshan et al. [1]), to make such visualisations, but no such method takes the user’s knowledge or objectives into account. Here, we formulate an information criterion for supervised human-guided data exploration to find the most informative views about the class structure of the data by taking both the user’s current knowledge and objectives into account. We study experimentally the scalability of our method for interactive use, and stability with respect to the size of the class of interest. We show that our method gives understandable and useful results when analysing real-world datasets, and a comparison to SPCA demonstrates the effect of the user’s background knowledge. The implementation will be released as an open source software library.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - International Workshops of ECML PKDD 2019, Proceedings
EditorsPeggy Cellier, Kurt Driessens
PublisherSpringer
Pages85-101
Number of pages17
ISBN (Print)9783030438227
DOIs
Publication statusPublished - 1 Jan 2020
Event19th Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019 - Wurzburg, Germany
Duration: 16 Sep 201920 Sep 2019

Publication series

NameCommunications in Computer and Information Science
Volume1167 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference19th Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019
CountryGermany
CityWurzburg
Period16/09/1920/09/19

Fingerprint Dive into the research topics of 'Supervised human-guided data exploration'. Together they form a unique fingerprint.

  • Cite this

    Oikarinen, E., Puolamäki, K., Khoshrou, S., & Pechenizkiy, M. (2020). Supervised human-guided data exploration. In P. Cellier, & K. Driessens (Eds.), Machine Learning and Knowledge Discovery in Databases - International Workshops of ECML PKDD 2019, Proceedings (pp. 85-101). (Communications in Computer and Information Science; Vol. 1167 CCIS). Springer. https://doi.org/10.1007/978-3-030-43823-4_8