Abstract
Neuromorphic computing systems are set to revolutionize energy-constrainedrobotics by achieving orders-of-magnitude efficiency gains, while enablingnative temporal processing. Spiking Neural Networks (SNNs) represent apromising algorithmic approach for these systems, yet their application tocomplex control tasks faces two critical challenges: (1) the non-differentiablenature of spiking neurons necessitates surrogate gradients with unclearoptimization properties, and (2) the stateful dynamics of SNNs require trainingon sequences, which in reinforcement learning (RL) is hindered by limitedsequence lengths during early training, preventing the network from bridgingits warm-up period. We address these challenges by systematically analyzing surrogate gradientslope settings, showing that shallower slopes increase gradient magnitude indeeper layers but reduce alignment with true gradients. In supervised learning,we find no clear preference for fixed or scheduled slopes. The effect is muchmore pronounced in RL settings, where shallower slopes or scheduled slopes leadto a 2.1x improvement in both training and final deployed performance. Next, wepropose a novel training approach that leverages a privileged guiding policy tobootstrap the learning process, while still exploiting online environmentinteractions with the spiking policy. Combining our method with an adaptiveslope schedule for a real-world drone position control task, we achieve anaverage return of 400 points, substantially outperforming prior techniques,including Behavioral Cloning and TD3BC, which achieve at most --200 pointsunder the same conditions. This work advances both the theoreticalunderstanding of surrogate gradient learning in SNNs and practical trainingmethodologies for neuromorphic controllers demonstrated in real-world roboticsystems.