Abstract
The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single reward and these multiple rewards might be conflicting. The agent has a set of optimal arms and the agent's goal is not only finding the optimal arms, but also playing them fairly. To find the optimal arm set, the agent uses a linear scalarized (LS) function which converts the multi-objective arms into one-objective arms. LS function is simple, however it can not find all the optimal arm set. As a result, we extend knowledge gradient (KG) policy to LS function. We propose two variants of linear scalarized-KG, LS-KG across arms and dimensions. We experimentally compare the two variant, LS-KG across arms finds the optimal arm set, while LS-KG across dimensions plays fairly the optimal arms.
| Original language | English |
|---|---|
| Title of host publication | 22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2014 - Proceedings |
| Publisher | i6doc.com publication |
| Pages | 147-152 |
| Number of pages | 6 |
| ISBN (Print) | 9782874190957 |
| Publication status | Published - 2014 |
| Event | 22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014) - Bruges, Belgium Duration: 23 Apr 2014 → 25 Apr 2014 |
Conference
| Conference | 22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014) |
|---|---|
| Country/Territory | Belgium |
| City | Bruges |
| Period | 23/04/14 → 25/04/14 |
Fingerprint
Dive into the research topics of 'Linear scalarized knowledge gradient in the multi-objective multi-armed bandits problem'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver