Knowledge gradient for multi-objective multi-armed bandit algorithms

S.Q. Yahyaa, M.M. Drugan, B. Manderick

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

20 Citations (Scopus)

Abstract

We extend knowledge gradient (KG) policy for the multi-objective, multi-armed bandits problem to efficiently explore the Pareto optimal arms. We consider two partial order relationships to order the mean vectors, i.e. Pareto and scalarized functions. Pareto KG finds the optimal arms using Pareto search, while the scalarizations-KG transform the multi-objective arms into one-objective arm to find the optimal arms. To measure the performance of the proposed algorithms, we propose three regret measures. We compare the performance of knowledge gradient policy with UCB1 on a multi-objective multi-armed bandits problem, where KG outperforms UCB1.

Original languageEnglish
Title of host publicationICAART 2014 - Proceedings of the 6th International Conference on Agents and Artificial Intelligence, 6-8 March 2014, Angers, France
PublisherSciTePress Digital Library
Pages74-83
Number of pages10
Volume1
ISBN (Print)9789897580154
Publication statusPublished - 2014
Externally publishedYes
Event6th International Conference on Agents and Artificial Intelligence (ICAART 2014) - Angers, France
Duration: 6 Mar 20148 Mar 2014
Conference number: 6
http://www.icaart.org/?y=2014

Conference

Conference6th International Conference on Agents and Artificial Intelligence (ICAART 2014)
Abbreviated titleICAART 2014
Country/TerritoryFrance
CityAngers
Period6/03/148/03/14
OtherConference held in conjunction with the 3rd International Conference on Pattern Recognition Applications and Methods (ICPRAM 2014) and the 3rd International Conference on Operations Research and Enterprise Systems (ICORES 2014)
Internet address

Keywords

  • Knowledge gradient policy
  • Multi-armed bandit problems
  • Multi-objective optimization

Fingerprint

Dive into the research topics of 'Knowledge gradient for multi-objective multi-armed bandit algorithms'. Together they form a unique fingerprint.

Cite this