Skip to main navigation Skip to search Skip to main content

Infinite horizon multi-armed bandits with reward vectors: exploration/exploitation trade-off

  • M.M. Drugan

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Downloads (Pure)

Abstract

We focus on the effect of the exploration/exploitation tradeoff strategies on the algorithmic design off multi-armed bandits (MAB) with reward vectors. Pareto dominance relation assesses the quality of reward vectors in infinite horizon MABs, like the UCB1 and UCB2 algorithms. In single objective MABs, there is a trade-off between the exploration of the suboptimal arms, and exploitation of a single optimal arm. Pareto dominance based MABs fairly exploit all Pareto optimal arms, and explore suboptimal arms. We study the exploration vs exploitation trade-off for two UCB like algorithms for reward vectors. We analyse the properties of the proposed MAB algorithms in terms of upper regret bounds and we experimentally compare their exploration vs exploitation trade-off on a bi-objective Bernoulli environment coming from control theory.

Original languageEnglish
Title of host publicationAgents and Artificial Intelligence
Subtitle of host publication7th International Conference, ICAART 2015, Lisbon, Portugal, January 10-12, 2015, Revised Selected Papers
EditorsB. Duval, J. van den Herik, St. Loiseau, J. Filipe
Place of PublicationBerlin
PublisherSpringer
Pages128-144
Number of pages17
ISBN (Electronic)978-3-319-27947-3
ISBN (Print)9783319279466
DOIs
Publication statusPublished - 2015
Externally publishedYes
Event7th International Conference on Agents and Artificial Intelligence (ICAART 2015) - Lisbon, Portugal
Duration: 10 Jan 201512 Jan 2015
Conference number: 7
http://www.icaart.org/?y=2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9494
ISSN (Print)03029743
ISSN (Electronic)16113349

Conference

Conference7th International Conference on Agents and Artificial Intelligence (ICAART 2015)
Abbreviated titleICAART 2015
Country/TerritoryPortugal
CityLisbon
Period10/01/1512/01/15
OtherConference held in conjunction with the 4th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2015) and the 4th International Conference on Operations Research and Enterprise Systems (ICORES 2015)
Internet address

Keywords

  • Infinite horizon policies
  • Multi-armed bandits
  • Multi-objective optimisation
  • Pareto dominance relation

Fingerprint

Dive into the research topics of 'Infinite horizon multi-armed bandits with reward vectors: exploration/exploitation trade-off'. Together they form a unique fingerprint.

Cite this