Knowledge gradient for multi-objective multi-armed bandit algorithms

S.Q. Yahyaa, M.M. Drugan, B. Manderick

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

15 Citations (Scopus)


We extend knowledge gradient (KG) policy for the multi-objective, multi-armed bandits problem to efficiently explore the Pareto optimal arms. We consider two partial order relationships to order the mean vectors, i.e. Pareto and scalarized functions. Pareto KG finds the optimal arms using Pareto search, while the scalarizations-KG transform the multi-objective arms into one-objective arm to find the optimal arms. To measure the performance of the proposed algorithms, we propose three regret measures. We compare the performance of knowledge gradient policy with UCB1 on a multi-objective multi-armed bandits problem, where KG outperforms UCB1.

Original languageEnglish
Title of host publicationICAART 2014 - Proceedings of the 6th International Conference on Agents and Artificial Intelligence, 6-8 March 2014, Angers, France
PublisherSCITEPRESS-Science and Technology Publications, Lda.
Number of pages10
ISBN (Print)9789897580154
Publication statusPublished - 2014
Externally publishedYes
Event6th International Conference on Agents and Artificial Intelligence (ICAART 2014) - Angers, France
Duration: 6 Mar 20148 Mar 2014
Conference number: 6


Conference6th International Conference on Agents and Artificial Intelligence (ICAART 2014)
Abbreviated titleICAART 2014
OtherConference held in conjunction with the 3rd International Conference on Pattern Recognition Applications and Methods (ICPRAM 2014) and the 3rd International Conference on Operations Research and Enterprise Systems (ICORES 2014)
Internet address


  • Knowledge gradient policy
  • Multi-armed bandit problems
  • Multi-objective optimization

Cite this