Abstract
In this paper, several techniques for learning game state evaluationfunctions by reinforcement are proposed. The first is a generalization of treebootstrapping (tree learning): it is adapted to the context of reinforcementlearning without knowledge based on non-linear functions. With this technique,no information is lost during the reinforcement learning process. The second isa modification of minimax with unbounded depth extending the best sequences ofactions to the terminal states. This modified search is intended to be usedduring the learning process. The third is to replace the classic gain of a game(+1 / -1) with a reinforcement heuristic. We study particular reinforcementheuristics such as: quick wins and slow defeats ; scoring ; mobility orpresence. The four is another variant of unbounded minimax, which plays thesafest action instead of playing the best action. This modified search isintended to be used after the learning process. The five is a new actionselection distribution. The conducted experiments suggest that these techniquesimprove the level of play. Finally, we apply these different techniques todesign program-players to the game of Hex (size 11 and 13) surpassing the levelof Mohex 3HNN with reinforcement learning from self-play without knowledge.