Deviations from the Nash equilibrium and emergence of tacit collusion in a two-player optimal execution game with reinforcement learning

Abstract

The use of reinforcement learning algorithms in financial trading is becomingincreasingly prevalent. However, the autonomous nature of these algorithms canlead to unexpected outcomes that deviate from traditional game-theoreticalpredictions and may even destabilize markets. In this study, we examine ascenario in which two autonomous agents, modeled with Double Deep Q-Learning,learn to liquidate the same asset optimally in the presence of market impact,using the Almgren-Chriss (2000) framework. Our results show that the strategieslearned by the agents deviate significantly from the Nash equilibrium of thecorresponding market impact game. Notably, the learned strategies exhibit tacitcollusion, closely aligning with the Pareto-optimal solution. We furtherexplore how different levels of market volatility influence the agents'performance and the equilibria they discover, including scenarios wherevolatility differs between the training and testing phases.

Quick Read (beta)

loading the full paper ...