Abstract
Scaling deep reinforcement learning networks is challenging and often resultsin degraded performance, yet the root causes of this failure mode remain poorlyunderstood. Several recent works have proposed mechanisms to address this, butthey are often complex and fail to highlight the causes underlying thisdifficulty. In this work, we conduct a series of empirical analyses whichsuggest that the combination of non-stationarity with gradient pathologies, dueto suboptimal architectural choices, underlie the challenges of scale. Wepropose a series of direct interventions that stabilize gradient flow, enablingrobust performance across a range of network depths and widths. Ourinterventions are simple to implement and compatible with well-establishedalgorithms, and result in an effective mechanism that enables strongperformance even at large scales. We validate our findings on a variety ofagents and suites of environments.