Exploration versus exploitation trade-off in infinite horizon Pareto multi-armed bandits algorithms

M.M. Drugan, B. Manderick

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

4 Citations (Scopus)
1 Downloads (Pure)

Abstract

Multi-objective multi-armed bandits (MOMAB) are multi-armed bandits (MAB) extended to reward vectors. We use the Pareto dominance relation to assess the quality of reward vectors, as opposite to scalarization functions. In this paper, we study the exploration vs exploitation trade-off in infinite horizon MOMABs algorithms. Single objective MABs explore the suboptimal arms and exploit a single optimal arm. MOMABs explore the suboptimal arms, but they also need to exploit fairly all optimal arms. We study the exploration vs exploitation trade-off of the Pareto UCB1 algorithm. We extend UCB2 that is another popular infinite horizon MAB algorithm to rewards vectors using the Pareto dominance relation. We analyse the properties of the proposed MOMAB algorithms in terms of upper regret bounds. We experimentally compare the exploration vs exploitation trade-off of the proposed MOMAB algorithms on a bi-objective Bernoulli environment coming from control theory.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Agents and Artificial Intelligence : Lisbon, Portugal, 10-12 January 2015
Place of Publications.l.
PublisherSciTePress Digital Library
Pages66-77
Number of pages12
Volume2
ISBN (Print)9789897580741
Publication statusPublished - 2015
Externally publishedYes
Event7th International Conference on Agents and Artificial Intelligence (ICAART 2015) - Lisbon, Portugal
Duration: 10 Jan 201512 Jan 2015
Conference number: 7
http://www.icaart.org/?y=2015

Conference

Conference7th International Conference on Agents and Artificial Intelligence (ICAART 2015)
Abbreviated titleICAART 2015
Country/TerritoryPortugal
CityLisbon
Period10/01/1512/01/15
OtherConference held in conjunction with the 4th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2015) and the 4th International Conference on Operations Research and Enterprise Systems (ICORES 2015)
Internet address

Keywords

  • Infinite horizon policies
  • Multi-armed bandits
  • Multi-objective optimisation
  • Pareto dominance relation

Fingerprint

Dive into the research topics of 'Exploration versus exploitation trade-off in infinite horizon Pareto multi-armed bandits algorithms'. Together they form a unique fingerprint.

Cite this