The Impact of Batch Learning in Stochastic Linear Bandits

Research output: Working paperPreprintAcademic

20 Downloads (Pure)

Abstract

We consider a special case of bandit problems, named batched bandits, in which an agent observes batches of responses over a certain time period. Unlike previous work, we consider a more practically relevant batch-centric scenario of batch learning. That is to say, we provide a policy-agnostic regret analysis and demonstrate upper and lower bounds for the regret of a candidate policy. Our main theoretical results show that the impact of batch learning is a multiplicative factor of batch size relative to the regret of online behavior. Primarily, we study two settings of the stochastic linear bandits: bandits with finitely and infinitely many arms. While the regret bounds are the same for both settings, the former setting results hold under milder assumptions. Also, we provide a more robust result for the 2-armed bandit problem as an important insight. Finally, we demonstrate the consistency of theoretical results by conducting empirical experiments and reflect on optimal batch size choice.
Original languageEnglish
Publication statusPublished - 14 Feb 2022

Bibliographical note

This is a longer version of the paper published at ICDM'22. arXiv admin note: text overlap with arXiv:2111.02071

Fingerprint

Dive into the research topics of 'The Impact of Batch Learning in Stochastic Linear Bandits'. Together they form a unique fingerprint.

Cite this