Samenvatting
We extend knowledge gradient (KG) policy for the multi-objective, multi-armed bandits problem to efficiently explore the Pareto optimal arms. We consider two partial order relationships to order the mean vectors, i.e. Pareto and scalarized functions. Pareto KG finds the optimal arms using Pareto search, while the scalarizations-KG transform the multi-objective arms into one-objective arm to find the optimal arms. To measure the performance of the proposed algorithms, we propose three regret measures. We compare the performance of knowledge gradient policy with UCB1 on a multi-objective multi-armed bandits problem, where KG outperforms UCB1.
Originele taal-2 | Engels |
---|---|
Titel | ICAART 2014 - Proceedings of the 6th International Conference on Agents and Artificial Intelligence, 6-8 March 2014, Angers, France |
Uitgeverij | SciTePress Digital Library |
Pagina's | 74-83 |
Aantal pagina's | 10 |
Volume | 1 |
ISBN van geprinte versie | 9789897580154 |
Status | Gepubliceerd - 2014 |
Extern gepubliceerd | Ja |
Evenement | 6th International Conference on Agents and Artificial Intelligence (ICAART 2014) - Angers, Frankrijk Duur: 6 mrt. 2014 → 8 mrt. 2014 Congresnummer: 6 http://www.icaart.org/?y=2014 |
Congres
Congres | 6th International Conference on Agents and Artificial Intelligence (ICAART 2014) |
---|---|
Verkorte titel | ICAART 2014 |
Land/Regio | Frankrijk |
Stad | Angers |
Periode | 6/03/14 → 8/03/14 |
Internet adres |