Bernoulli bandits an empirical comparison

K. N. Ronoh, R. Oyamo, E. Milgo, M. Drugan, B. Manderick

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

3 Citations (Scopus)

Abstract

An empirical comparative study is made of a sample of action selection policies on a test suite of the Bernoulli multi-armed bandit with Κ = 10, Κ = 20 and Κ = 50 arms, each for which we consider several success probabilities. For such problems the rewards are either Success or Failure with unknown success rate. Our study focusses on ε-greedy, UCB1-Tuned, Thompson sampling, the Gittin's index policy, the knowledge gradient and a new hybrid algorithm. The last two are not wellknown in computer science. In this paper, we examine policy dependence on the horizon and report results which suggest that a new hybridized procedure based on Thompsons sampling improves on its regret.

Original languageEnglish
Title of host publication23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2015 - Proceedings
Publisheri6doc.com publication
Pages59-64
Number of pages6
ISBN (Print)9782875870148
Publication statusPublished - 2015
Event23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2015) - Hotel Novotel, Bruges, Belgium
Duration: 22 Apr 201524 Apr 2015
Conference number: 23

Conference

Conference23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2015)
Abbreviated titleESANN 2015
Country/TerritoryBelgium
CityBruges
Period22/04/1524/04/15
Other




Fingerprint

Dive into the research topics of 'Bernoulli bandits an empirical comparison'. Together they form a unique fingerprint.

Cite this