A Deep Reinforcement Learning Approach for Audio-based Navigation and Audio Source Localization in Multi-speaker Environments

Abstract

In this work we apply deep reinforcement learning to the problems ofnavigating a three-dimensional environment and inferring the locations of humanspeaker audio sources within, in the case where the only available informationis the raw sound from the environment, as a simulated human listener placed inthe environment would hear it. For this purpose we create two virtualenvironments using the Unity game engine, one presenting an audio-basednavigation problem and one presenting an audio source localization problem. Wealso create an autonomous agent based on PPO online reinforcement learningalgorithm and attempt to train it to solve these environments. Our experimentsshow that our agent achieves adequate performance and generalization ability inboth environments, measured by quantitative metrics, even when a limited amountof training data are available or the environment parameters shift in ways notencountered during training. We also show that a degree of agent knowledgetransfer is possible between the environments.

Quick Read (beta)

loading the full paper ...