Recent work in imitation learning articulate their formulation around theGAIL architecture, relying on the adversarial training procedure introduced inGANs. Albeit successful at generating behaviours similar to those demonstratedto the agent, GAIL suffers from a high sample complexity in the number ofinteractions it has to carry out in the environment in order to achievesatisfactory performance. In this work, we dramatically shrink the amount ofinteractions with the environment by leveraging an off-policy actor-criticarchitecture. Additionally, employing deterministic policy gradients allows usto treat the learned reward as a differentiable node in the computationalgraph, while preserving the model-free nature of our approach. Our experimentsspan a variety of continuous control tasks.