The scalarized multi-objective multi-armed bandit problem: an empirical study of its exploration vs. exploitation tradeoff

S.Q. Yahyaa, M.M. Drugan, B. Manderick

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

15 Citations (Scopus)

Abstract

The multi-armed bandit (MAB) problem is the simplest sequential decision process with stochastic rewards where an agent chooses repeatedly from different arms to identify as soon as possible the optimal arm, i.e. the one of the highest mean reward. Both the knowledge gradient (KG) policy and the upper confidence bound (UCB) policy work well in practice for the MAB-problem because of a good balance between exploitation and exploration while choosing arms. In case of the multi-objective MAB (or MOMAB)-problem, arms generate a vector of rewards, one per arm, instead of a single scalar reward. In this paper, we extend the KG-policy to address multi-objective problems using scalarization functions that transform reward vectors into single scalar reward. We consider different scalarization functions and we call the corresponding class of algorithms scalarized KG. We compare the resulting algorithms with the corresponding variants of the multi-objective UCBl-policy (MO-UCB1) on a number of MOMAB-problems where the reward vectors are drawn from a multivariate normal distribution. We compare experimentally the exploration versus exploitation trade-off and we conclude that scalarized-KG outperforms MO-UCB1 on these test problems.

Original languageEnglish
Title of host publication2014 International Joint Conference on Neural Networks (IJCNN), 6-11 July 2014, Beijing, China
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers
Pages2290-2297
Number of pages8
ISBN (Print)9781479914845
DOIs
Publication statusPublished - 3 Sept 2014
Externally publishedYes
Event2014 International Joint Conference on Neural Networks, IJCNN 2014 - Beijing International Convention Center, Beijing, China
Duration: 6 Jul 201411 Jul 2014
http://www.ieee-wcci2014.org

Conference

Conference2014 International Joint Conference on Neural Networks, IJCNN 2014
Abbreviated titleIJCNN 2014
Country/TerritoryChina
CityBeijing
Period6/07/1411/07/14
OtherInternational Joint Conference on Neural Networks
Internet address

Fingerprint

Dive into the research topics of 'The scalarized multi-objective multi-armed bandit problem: an empirical study of its exploration vs. exploitation tradeoff'. Together they form a unique fingerprint.

Cite this