We present foundations for using Model Predictive Control (MPC) as adifferentiable policy class for reinforcement learning in continuous state andaction spaces. This provides one way of leveraging and combining the advantagesof model-free and model-based approaches. Specifically, we differentiatethrough MPC by using the KKT conditions of the convex approximation at a fixedpoint of the controller. Using this strategy, we are able to learn the cost anddynamics of a controller via end-to-end learning. Our experiments focus onimitation learning in the pendulum and cartpole domains, where we learn thecost and dynamics terms of an MPC policy class. We show that our MPC policiesare significantly more data-efficient than a generic neural network and thatour method is superior to traditional system identification in a setting wherethe expert is unrealizable.