Onderzoeksoutput per jaar
Onderzoeksoutput per jaar
Ariyan Bighashdel (Corresponding author), Pavol Jancura, Gijs Dubbelman
Onderzoeksoutput: Bijdrage aan tijdschrift › Tijdschriftartikel › Academic › peer review
In this paper, we define a novel inverse reinforcement learning (IRL) problem where the demonstrations are multi-intention, i.e., collected from multi-intention experts, unlabeled, i.e., without intention labels, and partially overlapping, i.e., shared between multiple intentions. In the presence of overlapping demonstrations, current IRL methods, developed to handle multi-intention and unlabeled demonstrations, cannot successfully learn the underlying reward functions. To solve this limitation, we propose a novel clustering-based approach to disentangle the observed demonstrations and experimentally validate its advantages. Traditional clustering-based approaches to multi-intention IRL, which are developed on the basis of model-based Reinforcement Learning (RL), formulate the problem using parametric density estimation. However, in high-dimensional environments and unknown system dynamics, i.e., model-free RL, the solution of parametric density estimation is only tractable up to the density normalization constant. To solve this, we formulate the problem as a mixture of logistic regressions to directly handle the unnormalized density. To research the challenges faced by overlapping demonstrations, we introduce the concepts of shared pair, which is a state-action pair that is shared in more than one intention, and separability, which resembles how well the multiple intentions can be separated in the joint state-action space. We provide theoretical analyses under the global optimality condition and the existence of shared pairs. Furthermore, we conduct extensive experiments on four simulated robotics tasks, extended to accept different intentions with specific levels of separability, and a synthetic driver task developed to directly control the separability. We evaluate the existing baselines on our defined problem and demonstrate, theoretically and experimentally, the advantages of our clustering-based solution, especially when the separability of the demonstrations decreases.
Originele taal-2 | Engels |
---|---|
Artikelnummer | 10.1007/s10994-022-06273-x |
Pagina's (van-tot) | 2263-2296 |
Aantal pagina's | 34 |
Tijdschrift | Machine Learning |
Volume | 112 |
Nummer van het tijdschrift | 7 |
DOI's | |
Status | Gepubliceerd - jul. 2023 |
Onderzoeksoutput: Bijdrage aan tijdschrift › Commentaar/Brief aan de redacteur › Academic › peer review