Abstract
Meta reinforcement learning aims to develop policies that generalize tounseen tasks sampled from a task distribution. While context-based meta-RLmethods improve task representation using task latents, they often strugglewith out-of-distribution (OOD) tasks. To address this, we propose Task-AwareVirtual Training (TAVT), a novel algorithm that accurately captures taskcharacteristics for both training and OOD scenarios using metric-basedrepresentation learning. Our method successfully preserves task characteristicsin virtual tasks and employs a state regularization technique to mitigateoverestimation errors in state-varying environments. Numerical resultsdemonstrate that TAVT significantly enhances generalization to OOD tasks acrossvarious MuJoCo and MetaWorld environments. Our code is available athttps://github.com/JM-Kim-94/tavt.git.