Abstract
An empirical comparative study is made of a sample of action selection policies on a test suite of the Bernoulli multi-armed bandit with Κ = 10, Κ = 20 and Κ = 50 arms, each for which we consider several success probabilities. For such problems the rewards are either Success or Failure with unknown success rate. Our study focusses on ε-greedy, UCB1-Tuned, Thompson sampling, the Gittin's index policy, the knowledge gradient and a new hybrid algorithm. The last two are not wellknown in computer science. In this paper, we examine policy dependence on the horizon and report results which suggest that a new hybridized procedure based on Thompsons sampling improves on its regret.
Original language | English |
---|---|
Title of host publication | 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2015 - Proceedings |
Publisher | i6doc.com publication |
Pages | 59-64 |
Number of pages | 6 |
ISBN (Print) | 9782875870148 |
Publication status | Published - 2015 |
Event | 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2015) - Hotel Novotel, Bruges, Belgium Duration: 22 Apr 2015 → 24 Apr 2015 Conference number: 23 |
Conference
Conference | 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2015) |
---|---|
Abbreviated title | ESANN 2015 |
Country/Territory | Belgium |
City | Bruges |
Period | 22/04/15 → 24/04/15 |
Other |