Abstract
We focus on the effect of the exploration/exploitation tradeoff strategies on the algorithmic design off multi-armed bandits (MAB) with reward vectors. Pareto dominance relation assesses the quality of reward vectors in infinite horizon MABs, like the UCB1 and UCB2 algorithms. In single objective MABs, there is a trade-off between the exploration of the suboptimal arms, and exploitation of a single optimal arm. Pareto dominance based MABs fairly exploit all Pareto optimal arms, and explore suboptimal arms. We study the exploration vs exploitation trade-off for two UCB like algorithms for reward vectors. We analyse the properties of the proposed MAB algorithms in terms of upper regret bounds and we experimentally compare their exploration vs exploitation trade-off on a bi-objective Bernoulli environment coming from control theory.
| Original language | English |
|---|---|
| Title of host publication | Agents and Artificial Intelligence |
| Subtitle of host publication | 7th International Conference, ICAART 2015, Lisbon, Portugal, January 10-12, 2015, Revised Selected Papers |
| Editors | B. Duval, J. van den Herik, St. Loiseau, J. Filipe |
| Place of Publication | Berlin |
| Publisher | Springer |
| Pages | 128-144 |
| Number of pages | 17 |
| ISBN (Electronic) | 978-3-319-27947-3 |
| ISBN (Print) | 9783319279466 |
| DOIs | |
| Publication status | Published - 2015 |
| Externally published | Yes |
| Event | 7th International Conference on Agents and Artificial Intelligence (ICAART 2015) - Lisbon, Portugal Duration: 10 Jan 2015 → 12 Jan 2015 Conference number: 7 http://www.icaart.org/?y=2015 |
Publication series
| Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Volume | 9494 |
| ISSN (Print) | 03029743 |
| ISSN (Electronic) | 16113349 |
Conference
| Conference | 7th International Conference on Agents and Artificial Intelligence (ICAART 2015) |
|---|---|
| Abbreviated title | ICAART 2015 |
| Country/Territory | Portugal |
| City | Lisbon |
| Period | 10/01/15 → 12/01/15 |
| Other | Conference held in conjunction with the 4th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2015) and the 4th International Conference on Operations Research and Enterprise Systems (ICORES 2015) |
| Internet address |
Keywords
- Infinite horizon policies
- Multi-armed bandits
- Multi-objective optimisation
- Pareto dominance relation
Fingerprint
Dive into the research topics of 'Infinite horizon multi-armed bandits with reward vectors: exploration/exploitation trade-off'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver