Abstract
We describe a generalized Q-learning type algorithm for
reinforcement learning in competitive multi-agent games. We make
the observation that in a competitive setting with adaptive agents an
agent's actions will (likely) result in changes in the opponents policies.
In addition to accounting for the estimated policies of the opponents,
our algorithm also adjusts these future opponent policies by
incorporating estimates of how the opponents change their policy as a
reaction to ones own actions. We present results showing that agents
that learn with this algorithm can successfully achieve high reward in
competitive multi-agent games where myopic self-interested behavior
conflicts with the long term individual interests of the players.We
show that this approach successfully scales for multi-agent games of
various sizes, in particular to the social dilemma type problems: from
the small iterated Prisoner's Dilemma, to larger settings akin to Harding's
Tragedy of the Commons. Thus, our multi-agent reinforcement
algorithm is foresighted enough to correctly anticipate future rewards
in the important problem class of social dilemmas, without having to
resort to negotiation-like protocols or precoded strategies.
| Original language | English |
|---|---|
| Title of host publication | ECAI 2006, 17th European Conference on Artificial Intelligence, August 29 - September 1, 2006, Riva del Garda, Italy, Including Prestigious Applications of Intelligent Systems (PAIS 2006), Proceedings |
| Editors | G. Brewka, S. Coradeschi, A. Perini |
| Place of Publication | Amsterdam |
| Publisher | IOS Press |
| Pages | 536-540 |
| ISBN (Print) | 1-58603-642-4 |
| Publication status | Published - 2006 |