The scalarized multi-objective multi-armed bandit problem: an empirical study of its exploration vs. exploitation tradeoff

S.Q. Yahyaa, M.M. Drugan, B. Manderick

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

16 Citaten (Scopus)

Samenvatting

The multi-armed bandit (MAB) problem is the simplest sequential decision process with stochastic rewards where an agent chooses repeatedly from different arms to identify as soon as possible the optimal arm, i.e. the one of the highest mean reward. Both the knowledge gradient (KG) policy and the upper confidence bound (UCB) policy work well in practice for the MAB-problem because of a good balance between exploitation and exploration while choosing arms. In case of the multi-objective MAB (or MOMAB)-problem, arms generate a vector of rewards, one per arm, instead of a single scalar reward. In this paper, we extend the KG-policy to address multi-objective problems using scalarization functions that transform reward vectors into single scalar reward. We consider different scalarization functions and we call the corresponding class of algorithms scalarized KG. We compare the resulting algorithms with the corresponding variants of the multi-objective UCBl-policy (MO-UCB1) on a number of MOMAB-problems where the reward vectors are drawn from a multivariate normal distribution. We compare experimentally the exploration versus exploitation trade-off and we conclude that scalarized-KG outperforms MO-UCB1 on these test problems.

Originele taal-2Engels
Titel2014 International Joint Conference on Neural Networks (IJCNN), 6-11 July 2014, Beijing, China
Plaats van productiePiscataway
UitgeverijInstitute of Electrical and Electronics Engineers
Pagina's2290-2297
Aantal pagina's8
ISBN van geprinte versie9781479914845
DOI's
StatusGepubliceerd - 3 sep. 2014
Extern gepubliceerdJa
Evenement2014 International Joint Conference on Neural Networks, IJCNN 2014 - Beijing International Convention Center, Beijing, China
Duur: 6 jul. 201411 jul. 2014
http://www.ieee-wcci2014.org

Congres

Congres2014 International Joint Conference on Neural Networks, IJCNN 2014
Verkorte titelIJCNN 2014
Land/RegioChina
StadBeijing
Periode6/07/1411/07/14
AnderInternational Joint Conference on Neural Networks
Internet adres

Vingerafdruk

Duik in de onderzoeksthema's van 'The scalarized multi-objective multi-armed bandit problem: an empirical study of its exploration vs. exploitation tradeoff'. Samen vormen ze een unieke vingerafdruk.

Citeer dit