Doorgaan naar hoofdnavigatie Doorgaan naar zoeken Ga verder naar hoofdinhoud

Evolving Constrained Reinforcement Learning Policy

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

Samenvatting

Evolutionary algorithms have been used to evolve a population of actors to generate diverse experiences for training reinforcement learning agents, which helps to tackle the temporal credit assignment problem and improves the exploration efficiency. However, when adapting this approach to address constrained problems, balancing the trade-off between the reward and constraint violation is hard. In this paper, we propose a novel evolutionary constrained reinforcement learning (ECRL) algorithm, which adaptively balances the reward and constraint violation with stochastic ranking, and at the same time, restricts the policy's behaviour by maintaining a set of Lagrange relaxation coefficients with a constraint buffer. Extensive experiments on robotic control benchmarks show that our ECRL achieves outstanding performance compared to state-of-the-art algorithms. Ablation analysis shows the benefits of introducing stochastic ranking and constraint buffer.
Originele taal-2Engels
Titel2023 International Joint Conference on Neural Networks, IJCNN 2023
UitgeverijInstitute of Electrical and Electronics Engineers
Aantal pagina's8
ISBN van elektronische versie978-1-6654-8867-9
DOI's
StatusGepubliceerd - 2 aug. 2023
Extern gepubliceerdJa

Financiering

This work was supported by the National Natural Science Foundation of China (Grant Nos. 62250710682, 61906083), the Guangdong Provincial Key Laboratory (Grant No. 2020B121201001), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No. 2017ZT07X386), the Shenzhen Science and Technology Program (Grant No. KQTD2016112514355531), the Shenzhen Fundamental Research Program (Grant No. JCYJ20190809121403553), and the Research Institute of Trustworthy Autonomous Systems. Corresponding author: Jialin Liu ([email protected]).

Vingerafdruk

Duik in de onderzoeksthema's van 'Evolving Constrained Reinforcement Learning Policy'. Samen vormen ze een unieke vingerafdruk.

Citeer dit