Abstract
Multi-objective multi-armed bandits (MOMAB) is a multiarm bandit variant that uses stochastic reward vectors. In this paper, we propose three MOMAB algorithms. The first algorithm uses a fixed set of linear scalarization functions to identify the Pareto front. Two topological approaches identify thePareto front using linearweighted combinations of reward vectors. The weight hyper-rectangle decomposition algorithm explores a convex shape Pareto front by grouping scalarization functions that optimise the same arm intoweight hyperrectangles. It is generally acknowledged that linear scalarization is not able to identify all the Pareto front for non-convex shapes. The hierarchical PAC algorithm iteratively decomposes the Pareto front into a set of convex shapes to identify the entire Pareto front. We compare the performance of these algorithms on a bi-objective stochastic environment inspired from a real life control application.
Original language | English |
---|---|
Title of host publication | Evolutionary Multi-Criterion Optimization |
Subtitle of host publication | 8th International Conference, EMO 2015, Guimarães, Portugal, March 29 - April 1, 2015. Proceedings, Part II |
Editors | A. Gaspar-Cunha , C. Henggeler Antunes , C. Coello Coello |
Place of Publication | Berlin |
Publisher | Springer |
Pages | 156-171 |
Number of pages | 16 |
ISBN (Electronic) | 978-3-319-15892-1 |
ISBN (Print) | 9783319158914 |
DOIs | |
Publication status | Published - 2015 |
Externally published | Yes |
Event | 8th International Conference on Evolutionary Multi-Criterion Optimization (EMO 2015) - Guimarães, Portugal Duration: 29 Mar 2015 → 1 Apr 2015 Conference number: 8 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Volume | 9019 |
ISSN (Print) | 03029743 |
ISSN (Electronic) | 16113349 |
Conference
Conference | 8th International Conference on Evolutionary Multi-Criterion Optimization (EMO 2015) |
---|---|
Abbreviated title | EMO 2015 |
Country/Territory | Portugal |
City | Guimarães |
Period | 29/03/15 → 1/04/15 |
Keywords
- Multi-objective multi-armed bandits
- Pareto front identification
- Scalarization functions
- Topological decomposition