We investigate whether quantum annealers with select chip layouts canoutperform classical computers in reinforcement learning tasks. We associate atransverse field Ising spin Hamiltonian with a layout of qubits similar to thatof a deep Boltzmann machine (DBM) and use simulated quantum annealing (SQA) tonumerically simulate quantum sampling from this system. We design areinforcement learning algorithm in which the set of visible nodes representingthe states and actions of an optimal policy are the first and last layers ofthe deep network. In absence of a transverse field, our simulations show thatDBMs are trained more effectively than restricted Boltzmann machines (RBM) withthe same number of nodes. We then develop a framework for training the networkas a quantum Boltzmann machine (QBM) in the presence of a significanttransverse field for reinforcement learning. This method also outperforms thereinforcement learning method that uses RBMs.