Classical distillation methods transfer representations from a "teacher"neural network to a "student" network by matching their output activations.Recent methods also match the Jacobians, or the gradient of output activationswith the input. However, this involves making some ad hoc decisions, inparticular, the choice of the loss function. In this paper, we first establish an equivalence between Jacobian matchingand distillation with input noise, from which we derive appropriate lossfunctions for Jacobian matching. We then rely on this analysis to applyJacobian matching to transfer learning by establishing equivalence of a recenttransfer learning procedure to distillation. We then show experimentally on standard image datasets that Jacobian-basedpenalties improve distillation, robustness to noisy inputs, and transferlearning.