Scaling model-based inverse reinforcement learning (IRL) to real roboticmanipulation tasks with unknown dynamics remains an open problem. The keychallenges lie in learning good dynamics models, developing algorithms thatscale to high-dimensional state-spaces and being able to learn from both visualand proprioceptive demonstrations. In this work, we present a gradient-basedinverse reinforcement learning framework that utilizes a pre-trained visualdynamics model to learn cost functions when given only visual humandemonstrations. The learned cost functions are then used to reproduce thedemonstrated behavior via visual model predictive control. We evaluate ourframework on hardware on two basic object manipulation tasks.