Temporal difference learning for the game Tic-Tac-Toe 3D : applying structure to neural networks

M. van de Steeg, M.M. Drugan, M. Wiering

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

9 Citations (Scopus)

Abstract

When reinforcement learning is applied to large state spaces, such as those occurring in playing board games, the use of a good function approximator to learn to approximate the value function is very important. In previous research, multi-layer perceptrons have often been quite successfully used as function approximator for learning to play particular games with temporal difference learning. With the recent developments in deep learning, it is important to study if using multiple hidden layers or particular network structures can help to improve learning the value function. In this paper, we compare five different structures of multilayer perceptrons for learning to play the game Tic-Tac-Toe 3D, both when training through self-play and when training against the same fixed opponent they are tested against. We compare three fully connected multilayer perceptrons with a different number of hidden layers and/or hidden units, as well as two structured ones. These structured multilayer perceptrons have a first hidden layer that is only sparsely connected to the input layer, and has units that correspond to the rows in Tic-Tac-Toe 3D. This allows them to more easily learn the contribution of specific patterns on the corresponding rows. One of the two structured multilayer perceptrons has a second hidden layer that is fully connected to the first one, which allows the neural network to learn to non-linearly integrate the information in these detected patterns. The results on Tic-Tac-Toe 3D show that the deep structured neural network with integrated pattern detectors has the strongest performance out of the compared multilayer perceptrons against a fixed opponent, both through self-training and through training against this fixed opponent.
Original languageEnglish
Title of host publication2015 IEEE Symposium Series on Computational Intelligence, 7-10 December 2015, Cape Town, South Africa
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers
Pages564-570
Number of pages7
ISBN (Print)978-1-4799-7560-0
DOIs
Publication statusPublished - 2015

Keywords

  • Benchmark testing
  • Games
  • Learning (artificial intelligence)
  • Multilayer perceptrons
  • Three-dimensional displays
  • Tic-Tac-Toe 3D game
  • Training
  • computer games
  • deep structured neural network
  • integrated pattern detectors
  • learning (artificial intelligence)
  • multilayer perceptrons
  • neural networks
  • temporal difference learning

Fingerprint

Dive into the research topics of 'Temporal difference learning for the game Tic-Tac-Toe 3D : applying structure to neural networks'. Together they form a unique fingerprint.

Cite this