Linear scalarized knowledge gradient in the multi-objective multi-armed bandits problem

Saba Yahyaa, Madalina M. Drugan, Bernard Manderick

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

2 Citations (Scopus)

Abstract

The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single reward and these multiple rewards might be conflicting. The agent has a set of optimal arms and the agent's goal is not only finding the optimal arms, but also playing them fairly. To find the optimal arm set, the agent uses a linear scalarized (LS) function which converts the multi-objective arms into one-objective arms. LS function is simple, however it can not find all the optimal arm set. As a result, we extend knowledge gradient (KG) policy to LS function. We propose two variants of linear scalarized-KG, LS-KG across arms and dimensions. We experimentally compare the two variant, LS-KG across arms finds the optimal arm set, while LS-KG across dimensions plays fairly the optimal arms.

Original languageEnglish
Title of host publication22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2014 - Proceedings
Publisheri6doc.com publication
Pages147-152
Number of pages6
ISBN (Print)9782874190957
Publication statusPublished - 2014
Event22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014) - Bruges, Belgium
Duration: 23 Apr 201425 Apr 2014

Conference

Conference22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014)
Country/TerritoryBelgium
CityBruges
Period23/04/1425/04/14

Fingerprint

Dive into the research topics of 'Linear scalarized knowledge gradient in the multi-objective multi-armed bandits problem'. Together they form a unique fingerprint.

Cite this