Doorgaan naar hoofdnavigatie Doorgaan naar zoeken Ga verder naar hoofdinhoud

Least-squares temporal difference with expected eligibility traces

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

13 Downloads (Pure)

Samenvatting

Temporal Difference (TD) and Least-Squares Temporal Difference (LSTD) are related methods to estimate the value function of a Markov Decision Process (MDP). While TD is a direct method using local data to update the value function estimate, LSTD is a Bellman projected equation method using full data to compute a one-time estimate. TD(λ) and LSTD(λ) extend TD and LSTD with eligibility traces. While estimating the value function, TD(λ) and LSTD(λ) use actual histories of features as traces. Recently, expected eligibility traces have been proposed for TD(λ) to not only include actual histories, but also all potential histories of features that could have occurred based on the model or the available data. While this idea can account for non-linear feature architectures, here we limit ourselves to linear feature architectures with full data updates in the context of LSTD. We show that, in striking contrast with the direct versions, an extension of LSTD to include the theoretical expected eligibility traces is equivalent to LSTD without eligibility traces (LSTD(0)). We obtain a similar result if we consider mixed eligibility traces; a combination of expected eligibility traces and ordinary eligibility traces. In fact, we show that LSTD with theoretical mixed eligibility traces is equivalent to LSTD(λ′) for a given λ′ that captures both the decay of the eligibility trace, as well as the balance between the expected eligibility trace and the ordinary trace. Furthermore, we consider alternative methods LSET(λ) and LSET(η,λ), which rely on the empirical means of the eligibility traces rather than the theoretical expected eligibility traces, and show that their value estimates converges to those of LSTD(0) and LSTD(λ′).
Originele taal-2Engels
Artikelnummer269
Pagina's (van-tot)1-21
Aantal pagina's21
TijdschriftMachine Learning
Volume114
Nummer van het tijdschrift12
Vroegere onlinedatum7 nov. 2025
DOI's
StatusGepubliceerd - dec. 2025

Financiering

The research leading to these results is partially funded by the German Federal Ministry of Education and Research (BMBF) within the project ASIMOV-D under grant agreement No. 01IS21022G [DLR], based on a decision of the German Bundestag. The research is carried out as part of the ITEA4 20216 ASIMOV project. The ASIMOV activities are supported by the Netherlands Organisation for Applied Scientific Research TNO and the Dutch Ministry of Economic Affairs and Climate (project number: AI211006).

Vingerafdruk

Duik in de onderzoeksthema's van 'Least-squares temporal difference with expected eligibility traces'. Samen vormen ze een unieke vingerafdruk.

Citeer dit