Abstract
One major factor impeding more widespread adoption of deep neural networks(DNNs) is their issues with robustness, which is essential for safety criticalapplications such as autonomous driving. This has motivated much recent work onadversarial attacks for DNNs, which mostly focus on pixel-level perturbationsvoid of semantic meaning. In contrast, we present a general framework foradversarial black box attacks on agents, which are intimately related to thesemantics of the task being performed by the agent. To do this, our proposedadversary (denoted as BBGAN) is trained to appropriately parametrize theenvironment (black box) with which the agent interacts, such that this agentperforms poorly on its dedicated task. We illustrate the application of ourBBGAN framework on three different tasks (primarily targeting aspects ofautonomous navigation): object detection, self-driving, and autonomous UAVracing. On these tasks, our approach can be used to generate failure cases thatfool an agent consistently.