In this paper, we propose a first formalization of the process ofexploitation of SQL injection vulnerabilities. We consider a simplification ofthe dynamics of SQL injection attacks by casting this problem as a securitycapture-the-flag challenge. We model it as a Markov decision process, and weimplement it as a reinforcement learning problem. We then deploy differentreinforcement learning agents tasked with learning an effective policy toperform SQL injection; we design our training in such a way that the agentlearns not just a specific strategy to solve an individual challenge but a moregeneric policy that may be applied to perform SQL injection attacks against anysystem instantiated randomly by our problem generator. We analyze the resultsin terms of the quality of the learned policy and in terms of convergence timeas a function of the complexity of the challenge and the learning agent'scomplexity. Our work fits in the wider research on the development ofintelligent agents for autonomous penetration testing and white-hat hacking,and our results aim to contribute to understanding the potential and the limitsof reinforcement learning in a security environment.