In recent years, reinforcement learning has seen interest because of deepQ-Learning, where the model is a convolutional neural network. Deep Q-Learninghas shown promising results in games such as Atari and AlphaGo. Instead oflearning the entire Q-table, it learns an estimate of the Q function thatdetermines a state's policy action. We use Q-Learning and deep Q-learning, tolearn control policies of four constraint satisfaction games (15-Puzzle,Minesweeper, 2048, and Sudoku). 15-Puzzle is a sliding permutation puzzle andprovides a challenge in addressing its large state space. Minesweeper andSudoku involve partially observable states and guessing. 2048 is also a slidingpuzzle but allows for easier state representation (compared to 15-Puzzle) anduses interesting reward shaping to solve the game. These games offer uniqueinsights into the potential and limits of reinforcement learning. The Q agentis trained with no rules of the game, with only the reward corresponding toeach state's action. Our unique contribution is in choosing the rewardstructure, state representation, and formulation of the deep neural network.For low shuffle, 15-Puzzle, achieves a 100% win rate, the medium and highshuffle achieve about 43% and 22% win rates respectively. On a standard 16x16Minesweeper board, both low and high-density boards achieve close to 45% winrate, whereas medium density boards have a low win rate of 15%. For 2048, the1024 win rate was achieved with significant ease (100%) with high win rates for2048, 4096, 8192 and 16384 as 40%, 0.05%, 0.01% and 0.004% , respectively. Theeasy Sudoku games had a win rate of 7%, while medium and hard games had 2.1%and 1.2% win rates, respectively. This paper explores the environmentcomplexity and behavior of a subset of constraint games using reward structureswhich can get us closer to understanding how humans learn.