Abstract
We extend knowledge gradient (KG) policy for the multi-objective, multi-armed bandits problem to efficiently explore the Pareto optimal arms. We consider two partial order relationships to order the mean vectors, i.e. Pareto and scalarized functions. Pareto KG finds the optimal arms using Pareto search, while the scalarizations-KG transform the multi-objective arms into one-objective arm to find the optimal arms. To measure the performance of the proposed algorithms, we propose three regret measures. We compare the performance of knowledge gradient policy with UCB1 on a multi-objective multi-armed bandits problem, where KG outperforms UCB1.
Original language | English |
---|---|
Title of host publication | ICAART 2014 - Proceedings of the 6th International Conference on Agents and Artificial Intelligence, 6-8 March 2014, Angers, France |
Publisher | SciTePress Digital Library |
Pages | 74-83 |
Number of pages | 10 |
Volume | 1 |
ISBN (Print) | 9789897580154 |
Publication status | Published - 2014 |
Externally published | Yes |
Event | 6th International Conference on Agents and Artificial Intelligence (ICAART 2014) - Angers, France Duration: 6 Mar 2014 → 8 Mar 2014 Conference number: 6 http://www.icaart.org/?y=2014 |
Conference
Conference | 6th International Conference on Agents and Artificial Intelligence (ICAART 2014) |
---|---|
Abbreviated title | ICAART 2014 |
Country/Territory | France |
City | Angers |
Period | 6/03/14 → 8/03/14 |
Other | Conference held in conjunction with the 3rd International Conference on Pattern Recognition Applications and Methods (ICPRAM 2014) and the 3rd International Conference on Operations Research and Enterprise Systems (ICORES 2014) |
Internet address |
Keywords
- Knowledge gradient policy
- Multi-armed bandit problems
- Multi-objective optimization