Abstract
Learning to control robots without requiring engineered models has been along-term goal, promising diverse and novel applications. Yet, reinforcementlearning has only achieved limited impact on real-time robot control due to itshigh demand of real-world interactions. In this work, by leveraging a learntprobabilistic model of drone dynamics, we learn a thrust-attitude controllerfor a quadrotor through model-based reinforcement learning. No prior knowledgeof the flight dynamics is assumed; instead, a sequential latent variable model,used generatively and as an online filter, is learnt from raw sensory input.The controller and value function are optimised entirely by propagatingstochastic analytic gradients through generated latent trajectories. We showthat "learning to fly" can be achieved with less than 30 minutes of experiencewith a single drone, and can be deployed solely using onboard computationalresources and sensors, on a self-built drone.