Samenvatting
To design agents capable of navigating sequential decision-making problems, it is essential to address the multi-objective nature of many real-world settings. We study agents operating in multi-objective problems while optimising for a known, but possibly non-linear, utility function. We extend the expert iteration framework, a technique combining reinforcement learning and planning, to multi-objective settings and demonstrate how to apply strong baselines such as AlphaZero and Gumbel AlphaZero, using a scalarisation scheme. As an alternative to direct scalarisation, we propose Distributional Search with Complex Objectives (DISCO), which extends the expert iteration framework to learn a distribution over vector returns. This distribution may subsequently be used during learning or to enable transfer to different utility functions after learning. Through experiments, we demonstrate that DISCO is competitive to the baselines while opening avenues for future research.
Originele taal-2 | Engels |
---|---|
Status | Gepubliceerd - 2024 |
Evenement | Multi-objective Decision Making Workshop at ECAI 2024 - Santiago de Compostela, Spanje Duur: 20 okt. 2024 → 20 okt. 2024 https://modem2024.vub.ac.be/ |
Workshop
Workshop | Multi-objective Decision Making Workshop at ECAI 2024 |
---|---|
Verkorte titel | MODeM 2024 |
Land/Regio | Spanje |
Stad | Santiago de Compostela |
Periode | 20/10/24 → 20/10/24 |
Internet adres |