The scalarized multi-objective multi-armed bandit problem: an empirical study of its exploration vs. exploitation tradeoff

S.Q. Yahyaa, M.M. Drugan, B. Manderick

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

9 Citations (Scopus)

Abstract

The multi-armed bandit (MAB) problem is the simplest sequential decision process with stochastic rewards where an agent chooses repeatedly from different arms to identify as soon as possible the optimal arm, i.e. the one of the highest mean reward. Both the knowledge gradient (KG) policy and the upper confidence bound (UCB) policy work well in practice for the MAB-problem because of a good balance between exploitation and exploration while choosing arms. In case of the multi-objective MAB (or MOMAB)-problem, arms generate a vector of rewards, one per arm, instead of a single scalar reward. In this paper, we extend the KG-policy to address multi-objective problems using scalarization functions that transform reward vectors into single scalar reward. We consider different scalarization functions and we call the corresponding class of algorithms scalarized KG. We compare the resulting algorithms with the corresponding variants of the multi-objective UCBl-policy (MO-UCB1) on a number of MOMAB-problems where the reward vectors are drawn from a multivariate normal distribution. We compare experimentally the exploration versus exploitation trade-off and we conclude that scalarized-KG outperforms MO-UCB1 on these test problems.

Original languageEnglish
Title of host publication2014 International Joint Conference on Neural Networks (IJCNN), 6-11 July 2014, Beijing, China
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers
Pages2290-2297
Number of pages8
ISBN (Print)9781479914845
DOIs
Publication statusPublished - 3 Sep 2014
Externally publishedYes
Event2014 International Joint Conference on Neural Networks (IJCNN 2014), July 6-11, 2014, Beijing, China - Beijing International Convention Center, Beijing, China
Duration: 6 Jul 201411 Jul 2014
http://www.ieee-wcci2014.org

Conference

Conference2014 International Joint Conference on Neural Networks (IJCNN 2014), July 6-11, 2014, Beijing, China
Abbreviated titleIJCNN 2014
CountryChina
CityBeijing
Period6/07/1411/07/14
OtherInternational Joint Conference on Neural Networks
Internet address

Fingerprint Dive into the research topics of 'The scalarized multi-objective multi-armed bandit problem: an empirical study of its exploration vs. exploitation tradeoff'. Together they form a unique fingerprint.

  • Cite this

    Yahyaa, S. Q., Drugan, M. M., & Manderick, B. (2014). The scalarized multi-objective multi-armed bandit problem: an empirical study of its exploration vs. exploitation tradeoff. In 2014 International Joint Conference on Neural Networks (IJCNN), 6-11 July 2014, Beijing, China (pp. 2290-2297). [6889390] Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/IJCNN.2014.6889390