Thompson sampling in the adaptive linear scalarized multi objective multi armed bandit

S.Q. Yahyaa, M.M. Drugan, B. Manderick

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

10 Citations (Scopus)

Abstract

In the stochastic multi-objective multi-armed bandit (MOMAB), arms generate a vector of stochastic normal rewards, one per objective, instead of a single scalar reward. As a result, there is not only one optimal arm, but there is a set of optimal arms (Pareto front) using Pareto dominance relation. The goal of an agent is to find the Pareto front. To find the optimal arms, the agent can use linear scalarization function that transforms a multi-objective problem into a single problem by summing the weighted objectives. Selecting the weights is crucial, since different weights will result in selecting a different optimum arm from the Pareto front. Usually, a predefined weights set is used and this can be computational inefficient when different weights will optimize the same Pareto optimal arm and arms in the Pareto front are not identified. In this paper, we propose a number of techniques that adapt the weights on the fly in order to ameliorate the performance of the scalarized MOMAB. We use genetic and adaptive scalarization functions from multi-objective optimization to generate new weights. We propose to use Thompson sampling policy to select frequently the weights that identify new arms on the Pareto front. We experimentally show that Thompson sampling improves the performance of the genetic and adaptive scalarization functions. All the proposed techniques improves the performance of the standard scalarized MOMAB with a fixed set of weights.

Original languageEnglish
Title of host publicationICAART 2015 - Proceedings of the International Conference on Agents and Artificial Intelligence. Volume 2. Lisbon, Portugal, 10-12-January, 2015
PublisherSCITEPRESS-Science and Technology Publications, Lda.
Pages55-65
Number of pages11
Volume2
ISBN (Print)9789897580741
DOIs
Publication statusPublished - 2015
Externally publishedYes
Event7th International Conference on Agents and Artificial Intelligence (ICAART 2015) - Lisbon, Portugal
Duration: 10 Jan 201512 Jan 2015
Conference number: 7
http://www.icaart.org/?y=2015

Conference

Conference7th International Conference on Agents and Artificial Intelligence (ICAART 2015)
Abbreviated titleICAART 2015
CountryPortugal
CityLisbon
Period10/01/1512/01/15
OtherConference held in conjunction with the 4th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2015) and the 4th International Conference on Operations Research and Enterprise Systems (ICORES 2015)
Internet address

Keywords

  • Linear scalarized function
  • Multi-armed bandit problems
  • Multi-objective optimization
  • Scalarized function set
  • Thompson sampling policy

Fingerprint Dive into the research topics of 'Thompson sampling in the adaptive linear scalarized multi objective multi armed bandit'. Together they form a unique fingerprint.

Cite this