Knowledge gradient for multi-objective multi-armed bandit algorithms

S.Q. Yahyaa, M.M. Drugan, B. Manderick

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

20 Citaten (Scopus)

Samenvatting

We extend knowledge gradient (KG) policy for the multi-objective, multi-armed bandits problem to efficiently explore the Pareto optimal arms. We consider two partial order relationships to order the mean vectors, i.e. Pareto and scalarized functions. Pareto KG finds the optimal arms using Pareto search, while the scalarizations-KG transform the multi-objective arms into one-objective arm to find the optimal arms. To measure the performance of the proposed algorithms, we propose three regret measures. We compare the performance of knowledge gradient policy with UCB1 on a multi-objective multi-armed bandits problem, where KG outperforms UCB1.

Originele taal-2Engels
TitelICAART 2014 - Proceedings of the 6th International Conference on Agents and Artificial Intelligence, 6-8 March 2014, Angers, France
UitgeverijSciTePress Digital Library
Pagina's74-83
Aantal pagina's10
Volume1
ISBN van geprinte versie9789897580154
StatusGepubliceerd - 2014
Extern gepubliceerdJa
Evenement6th International Conference on Agents and Artificial Intelligence (ICAART 2014) - Angers, Frankrijk
Duur: 6 mrt. 20148 mrt. 2014
Congresnummer: 6
http://www.icaart.org/?y=2014

Congres

Congres6th International Conference on Agents and Artificial Intelligence (ICAART 2014)
Verkorte titelICAART 2014
Land/RegioFrankrijk
StadAngers
Periode6/03/148/03/14
Internet adres

Vingerafdruk

Duik in de onderzoeksthema's van 'Knowledge gradient for multi-objective multi-armed bandit algorithms'. Samen vormen ze een unieke vingerafdruk.

Citeer dit