Scalarized lower upper confidence bound algorithm

M.M. Drugan

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Multi-objective evolutionary optimisation algorithms and stochastic multi-armed bandits techniques are combined in designing stochastic multi-objective multi-armed bandits (MOMAB) with an efficient exploration and exploitation trade-off. Lower upper confidence bound (LUCB) focuses on sampling the arms that are most probable to be misclassified (i.e., optimal or suboptimal arms) in order to identify the set of best arms aka the Pareto front. Our scalarized multi-objective LUCB (sMO-LUCB) is an adaptation of LUCB to reward vectors. Preliminary empirical results show good performance of the proposed algorithm on a bi-objective environment.
Original languageEnglish
Title of host publicationLearning and Intelligent Optimization
Subtitle of host publication9th International Conference, LION 9, Lille, France, January 12-15, 2015. Revised Selected Papers
EditorsCl. Dhaenens, L. Jourdan, M.-E. Marmion
Place of PublicationBerlin
PublisherSpringer
Pages229-235
ISBN (Print)978-3-319-19083-9
DOIs
Publication statusPublished - 2015

Publication series

NameLecture Notes in Computer Science
Volume8994

Fingerprint

Dive into the research topics of 'Scalarized lower upper confidence bound algorithm'. Together they form a unique fingerprint.

Cite this