Model-Based Reinforcement Learning in Multi-Objective Environments with a Distributional Critic

Willem Röpke, Diederik M. Roijers, Ann Nowé, Roxana Rădulescu, Hendrik Baier

Onderzoeksoutput: Bijdrage aan congresPaperAcademic

126 Downloads (Pure)

Samenvatting

To design agents capable of navigating sequential decision-making problems, it is essential to address the multi-objective nature of many real-world settings. We study agents operating in multi-objective problems while optimising for a known, but possibly non-linear, utility function. We extend the expert iteration framework, a technique combining reinforcement learning and planning, to multi-objective settings and demonstrate how to apply strong baselines such as AlphaZero and Gumbel AlphaZero, using a scalarisation scheme. As an alternative to direct scalarisation, we propose Distributional Search with Complex Objectives (DISCO), which extends the expert iteration framework to learn a distribution over vector returns. This distribution may subsequently be used during learning or to enable transfer to different utility functions after learning. Through experiments, we demonstrate that DISCO is competitive to the baselines while opening avenues for future research.
Originele taal-2Engels
StatusGepubliceerd - 2024
EvenementMulti-objective Decision Making Workshop at ECAI 2024 - Santiago de Compostela, Spanje
Duur: 20 okt. 202420 okt. 2024
https://modem2024.vub.ac.be/

Workshop

WorkshopMulti-objective Decision Making Workshop at ECAI 2024
Verkorte titelMODeM 2024
Land/RegioSpanje
StadSantiago de Compostela
Periode20/10/2420/10/24
Internet adres

Vingerafdruk

Duik in de onderzoeksthema's van 'Model-Based Reinforcement Learning in Multi-Objective Environments with a Distributional Critic'. Samen vormen ze een unieke vingerafdruk.

Citeer dit