Supervised human-guided data exploration

Emilia Oikarinen, Kai Puolamäki, Samaneh Khoshrou, Mykola Pechenizkiy

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review


An exploratory data analysis system should be aware of what a user already knows and what the user wants to know of the data. Otherwise it is impossible to provide the user with truly informative and useful views of the data. In our recently introduced framework for human-guided data exploration (Puolamäki et al. [20]), both the user’s knowledge and objectives are modelled as distributions over data, parametrised by tile constraints. This makes it possible to show the users the most informative views given their current knowledge and objectives. Often the data, however, comes with a class label and the user is interested only of the features informative related to the class. In non-interactive settings there exist dimensionality reduction methods, such as supervised PCA (Barshan et al. [1]), to make such visualisations, but no such method takes the user’s knowledge or objectives into account. Here, we formulate an information criterion for supervised human-guided data exploration to find the most informative views about the class structure of the data by taking both the user’s current knowledge and objectives into account. We study experimentally the scalability of our method for interactive use, and stability with respect to the size of the class of interest. We show that our method gives understandable and useful results when analysing real-world datasets, and a comparison to SPCA demonstrates the effect of the user’s background knowledge. The implementation will be released as an open source software library.

Originele taal-2Engels
TitelMachine Learning and Knowledge Discovery in Databases - International Workshops of ECML PKDD 2019, Proceedings
RedacteurenPeggy Cellier, Kurt Driessens
Aantal pagina's17
ISBN van geprinte versie9783030438227
StatusGepubliceerd - 1 jan 2020
Evenement2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019) - Wurzburg, Duitsland
Duur: 16 sep 201920 sep 2019
Congresnummer: 19

Publicatie series

NaamCommunications in Computer and Information Science
Volume1167 CCIS
ISSN van geprinte versie1865-0929
ISSN van elektronische versie1865-0937


Congres2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019)
Verkorte titelECML PKDD 2019
Internet adres


Duik in de onderzoeksthema's van 'Supervised human-guided data exploration'. Samen vormen ze een unieke vingerafdruk.

Citeer dit