Abstract
Visual reinforcement learning agents typically face serious performancedeclines in real-world applications caused by visual distractions. Existingmethods rely on fine-tuning the policy's representations with hand-craftedaugmentations. In this work, we propose Self-Consistent Model-based Adaptation(SCMA), a novel method that fosters robust adaptation without modifying thepolicy. By transferring cluttered observations to clean ones with a denoisingmodel, SCMA can mitigate distractions for various policies as a plug-and-playenhancement. To optimize the denoising model in an unsupervised manner, wederive an unsupervised distribution matching objective with a theoreticalanalysis of its optimality. We further present a practical algorithm tooptimize the objective by estimating the distribution of clean observationswith a pre-trained world model. Extensive experiments on multiple visualgeneralization benchmarks and real robot data demonstrate that SCMA effectivelyboosts performance across various distractions and exhibits better sampleefficiency.