Abstract
In this paper, we argue that mutual distillation between reinforcementlearning policies serves as an implicit regularization, preventing them fromoverfitting to irrelevant features. We highlight two key contributions: (a)Theoretically, for the first time, we prove that enhancing the policyrobustness to irrelevant features leads to improved generalization performance.(b) Empirically, we demonstrate that mutual distillation between policiescontributes to such robustness, enabling the spontaneous emergence of invariantrepresentations over pixel inputs. Overall, our findings challenge theconventional view of distillation as merely a means of knowledge transfer,offering a novel perspective on the generalization in deep reinforcementlearning.
Quick Read (beta)
loading the full paper ...