Abstract
End-to-end deep reinforcement learning (DRL) for quadrotor control promisesmany benefits -- easy deployment, task generalization and real-time executioncapability. Prior end-to-end DRL-based methods have showcased the ability todeploy learned controllers onto single quadrotors or quadrotor teamsmaneuvering in simple, obstacle-free environments. However, the addition ofobstacles increases the number of possible interactions exponentially, therebyincreasing the difficulty of training RL policies. In this work, we propose anend-to-end DRL approach to control quadrotor swarms in environments withobstacles. We provide our agents a curriculum and a replay buffer of theclipped collision episodes to improve performance in obstacle-richenvironments. We implement an attention mechanism to attend to the neighborrobots and obstacle interactions - the first successful demonstration of thismechanism on policies for swarm behavior deployed on severelycompute-constrained hardware. Our work is the first work that demonstrates thepossibility of learning neighbor-avoiding and obstacle-avoiding controlpolicies trained with end-to-end DRL that transfers zero-shot to realquadrotors. Our approach scales to 32 robots with 80% obstacle density insimulation and 8 robots with 20% obstacle density in physical deployment. Videodemonstrations are available on the project website at:https://sites.google.com/view/obst-avoid-swarm-rl.