More for Less: Safe Policy Improvement With Stronger Performance Guarantees

Patrick Wienhöft, Marnix Suilen, Thiago D. Simão, Clemens Dubslaff, Christel Baier, Nils Jansen

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Citation (Scopus)

Abstract

In an offline reinforcement learning setting, the safe policy improvement (SPI) problem aims to improve the performance of a behavior policy according to which sample data has been generated. State-of-the-art approaches to SPI require a high number of samples to provide practical probabilistic guarantees on the improved policy's performance. We present a novel approach to the SPI problem that provides the means to require less data for such guarantees. Specifically, to prove the correctness of these guarantees, we devise implicit transformations on the data set and the underlying environment model that serve as theoretical foundations to derive tighter improvement bounds for SPI. Our empirical evaluation, using the well-established SPI with baseline bootstrapping (SPIBB) algorithm, on standard benchmarks shows that our method indeed significantly reduces the sample complexity of the SPIBB algorithm.

Original languageEnglish
Title of host publicationProceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
EditorsEdith Elkind
PublisherInternational Joint Conferences on Artificial Intelligence (IJCAI)
Pages4406-4415
Number of pages10
ISBN (Electronic)9781956792034
Publication statusPublished - 2023
Event32nd International Joint Conference on Artificial Intelligence, IJCAI 2023 - Macao, China
Duration: 19 Aug 202325 Aug 2023

Conference

Conference32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
Country/TerritoryChina
CityMacao
Period19/08/2325/08/23

Bibliographical note

Funding Information:
The authors were partially supported by the DFG through the Cluster of Excellence EXC 2050/1 (CeTI, project ID 390696704, as part of Germany’s Excellence Strategy), the TRR 248 (see https://perspicuous-computing.science, project ID 389792660), the NWO grants OCENW.KLEIN.187 (Provably Correct Policies for Uncertain Partially Observable Markov Decision Processes) and NWA.1160.18.238 (PrimaVera), and the ERC Starting Grant 101077178 (DEUCE).

Publisher Copyright:
© 2023 International Joint Conferences on Artificial Intelligence. All rights reserved.

Fingerprint

Dive into the research topics of 'More for Less: Safe Policy Improvement With Stronger Performance Guarantees'. Together they form a unique fingerprint.

Cite this