Abstract
The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single reward and these multiple rewards might be conflicting. The agent has a set of optimal arms and the agent's goal is not only finding the optimal arms, but also playing them fairly. To find the optimal arm set, the agent uses a linear scalarized (LS) function which converts the multi-objective arms into one-objective arms. LS function is simple, however it can not find all the optimal arm set. As a result, we extend knowledge gradient (KG) policy to LS function. We propose two variants of linear scalarized-KG, LS-KG across arms and dimensions. We experimentally compare the two variant, LS-KG across arms finds the optimal arm set, while LS-KG across dimensions plays fairly the optimal arms.
Original language | English |
---|---|
Title of host publication | 22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2014 - Proceedings |
Publisher | i6doc.com publication |
Pages | 147-152 |
Number of pages | 6 |
ISBN (Print) | 9782874190957 |
Publication status | Published - 2014 |
Event | 22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014) - Bruges, Belgium Duration: 23 Apr 2014 → 25 Apr 2014 |
Conference
Conference | 22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014) |
---|---|
Country/Territory | Belgium |
City | Bruges |
Period | 23/04/14 → 25/04/14 |