Knowledge present in a domain is well expressed as relationships between corresponding concepts. For example, in zoology, animal species form complex hierarchies; in genomics, the different (parts of) molecules are organized in groups and subgroups based on their functions; plants, molecules, and astronomical objects all form complex taxonomies. Nevertheless, when applying supervised machine learning (ML) in such domains, we commonly reduce the complex and rich knowledge to a fixed set of labels, and induce a model shows good generalization performance with respect to these labels. The main reason for such a reductionist approach is the difficulty in eliciting the domain knowledge from the experts. Developing a label structure with sufficient fidelity and providing comprehensive multi-label annotation can be exceedingly labor-intensive in many real-world applications. In this paper, we provide a method for efficient hierarchical knowledge elicitation (HKE) from experts working with high-dimensional data such as images or videos. Our method is based on psychometric testing and active deep metric learning. The developed models embed the high-dimensional data in a metric space where distances are semantically meaningful, and the data can be organized in a hierarchical structure. We provide empirical evidence with a series of experiments on a synthetically generated dataset of simple shapes, and Cifar 10 and Fashion-MNIST benchmarks that our method is indeed successful in uncovering hierarchical structures.
|Publication status||E-pub ahead of print - 14 Apr 2020|
Bibliographical note16 pages, 11 figures