Abstract
Meta Reinforcement Learning (Meta RL) trains agents that adapt tofast-changing environments and tasks. Current strategies often lose adaptionefficiency due to the passive nature of model exploration, causing delayedunderstanding of new transition dynamics. This results in particularlyfast-evolving tasks being impossible to solve. We propose a novel approach,Hypothesis Network Planned Exploration (HyPE), that integrates an active andplanned exploration process via the hypothesis network to optimize adaptationspeed. HyPE uses a generative hypothesis network to form potential models ofstate transition dynamics, then eliminates incorrect models throughstrategically devised experiments. Evaluated on a symbolic version of theAlchemy game, HyPE outpaces baseline methods in adaptation speed and modelaccuracy, validating its potential in enhancing reinforcement learningadaptation in rapidly evolving settings.