Evaluation of deep reinforcement learning (RL) is inherently challenging.Especially the opaqueness of learned policies and the stochastic nature of bothagents and environments make testing the behavior of deep RL agents difficult.We present a search-based testing framework that enables a wide range of novelanalysis capabilities for evaluating the safety and performance of deep RLagents. For safety testing, our framework utilizes a search algorithm thatsearches for a reference trace that solves the RL task. The backtracking statesof the search, called boundary states, pose safety-critical situations. Wecreate safety test-suites that evaluate how well the RL agent escapessafety-critical situations near these boundary states. For robust performancetesting, we create a diverse set of traces via fuzz testing. These fuzz tracesare used to bring the agent into a wide variety of potentially unknown statesfrom which the average performance of the agent is compared to the averageperformance of the fuzz traces. We apply our search-based testing approach onRL for Nintendo's Super Mario Bros.