Data-Efficient Quadratic Q-Learning Using LMIs

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

Samenvatting

Reinforcement learning (RL) has seen significant research and application results but often requires large amounts of training data. This paper proposes two data-efficient off-policy RL methods that use parametrized Q-learning. In these methods, the Q-function is chosen to be linear in the parameters and quadratic in selected basis functions in the state and control deviations from a base policy. A cost penalizing the $\ell_1$-norm of Bellman errors is minimized. We propose two methods: Linear Matrix Inequality Q-Learning (LMI-QL) and its iterative variant (LMI-QLi), which solve the resulting episodic optimization problem through convex optimization. LMI-QL relies on a convex relaxation that yields a semidefinite programming (SDP) problem with linear matrix inequalities (LMIs). LMI-QLi entails solving sequential iterations of an SDP problem. Both methods combine convex optimization with direct Q-function learning, significantly improving learning speed. A numerical case study demonstrates their advantages over existing parametrized Q-learning methods.
Originele taal-2Engels
Titel2024 63rd IEEE Conference on Decision and Control (CDC)
UitgeverijInstitute of Electrical and Electronics Engineers
StatusGeaccepteerd/In druk - 24 jul. 2024
Evenement63rd IEEE Annual Conference on Decision and Control, CDC 2024 - Milan, Italië
Duur: 16 dec. 202419 dec. 2024

Congres

Congres63rd IEEE Annual Conference on Decision and Control, CDC 2024
Land/RegioItalië
StadMilan
Periode16/12/2419/12/24

Vingerafdruk

Duik in de onderzoeksthema's van 'Data-Efficient Quadratic Q-Learning Using LMIs'. Samen vormen ze een unieke vingerafdruk.

Citeer dit