Linear scalarized knowledge gradient in the multi-objective multi-armed bandits problem

Saba Yahyaa, Madalina M. Drugan, Bernard Manderick

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

2 Citations (Scopus)


The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single reward and these multiple rewards might be conflicting. The agent has a set of optimal arms and the agent's goal is not only finding the optimal arms, but also playing them fairly. To find the optimal arm set, the agent uses a linear scalarized (LS) function which converts the multi-objective arms into one-objective arms. LS function is simple, however it can not find all the optimal arm set. As a result, we extend knowledge gradient (KG) policy to LS function. We propose two variants of linear scalarized-KG, LS-KG across arms and dimensions. We experimentally compare the two variant, LS-KG across arms finds the optimal arm set, while LS-KG across dimensions plays fairly the optimal arms.

Original languageEnglish
Title of host publication22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2014 - Proceedings publication
Number of pages6
ISBN (Print)9782874190957
Publication statusPublished - 2014
Event22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014) - Bruges, Belgium
Duration: 23 Apr 201425 Apr 2014


Conference22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014)


Dive into the research topics of 'Linear scalarized knowledge gradient in the multi-objective multi-armed bandits problem'. Together they form a unique fingerprint.

Cite this