Model-based multi-objective reinforcement learning

  • M.A. Wiering
  • , M. Withagen
  • , M.M. Drugan

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

This paper describes a novel multi-objective reinforcement learning algorithm. The proposed algorithm first learns a model of the multi-objective sequential decision making problem, after which this learned model is used by a multi-objective dynamic programming method to compute Pareto optimal policies. The advantage of this model-based multi-objective reinforcement learning method is that once an accurate model has been estimated from the experiences of an agent in some environment, the dynamic programming method will compute all Pareto optimal policies. Therefore it is important that the agent explores the environment in an intelligent way by using a good exploration strategy. In this paper we have supplied the agent with two different exploration strategies and compare their effectiveness in estimating accurate models within a reasonable amount of time. The experimental results show that our method with the best exploration strategy is able to quickly learn all Pareto optimal policies for the Deep Sea Treasure problem.

Original languageEnglish
Title of host publication2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 9-12 December 2014, Orlando, Florida
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers
Pages1-6
ISBN (Electronic)978-1-4799-4552-8
ISBN (Print)9781479945535
DOIs
Publication statusPublished - 14 Jan 2014
Externally publishedYes
Event2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2014) - Orlando, United States
Duration: 9 Dec 201412 Dec 2014

Conference

Conference2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2014)
Abbreviated titleADPRL 2014
Country/TerritoryUnited States
CityOrlando
Period9/12/1412/12/14

Fingerprint

Dive into the research topics of 'Model-based multi-objective reinforcement learning'. Together they form a unique fingerprint.

Cite this