Abstract
Deep Reinforcement Learning has demonstrated the potential of neural networkstuned with gradient descent for solving complex tasks in well-delimitedenvironments. However, these neural systems are slow learners producingspecialised agents with no mechanism to continue learning beyond their trainingcurriculum. On the contrary, biological synaptic plasticity is persistent andmanifold, and has been hypothesised to play a key role in executive functionssuch as working memory and cognitive flexibility, potentially supporting moreefficient and generic learning abilities. Inspired by this, we propose to buildnetworks with dynamic weights, able to continually perform self-reflexivemodification as a function of their current synaptic state and action-rewardfeedback, rather than a fixed network configuration. The resulting model,MetODS (for Meta-Optimized Dynamical Synapses) is a broadly applicablemeta-reinforcement learning system able to learn efficient and powerful controlrules in the agent policy space. A single layer with dynamic synapses canperform one-shot learning, generalize navigation principles to unseenenvironments and demonstrate a strong ability to learn adaptive motor policies,comparing favourably with previous meta-reinforcement learning approaches.