Abstract
Deploying reinforcement learning (RL) in safety-critical settings isconstrained by brittleness under distribution shift. We studyout-of-distribution (OOD) detection for RL time series and introduce DEEDEE, atwo-statistic detector that revisits representation-heavy pipelines with aminimal alternative. DEEDEE uses only an episodewise mean and an RBF kernelsimilarity to a training summary, capturing complementary global and localdeviations. Despite its simplicity, DEEDEE matches or surpasses contemporarydetectors across standard RL OOD suites, delivering a 600-fold reduction incompute (FLOPs / wall-time) and an average 5% absolute accuracy gain overstrong baselines. Conceptually, our results indicate that diverse anomaly typesoften imprint on RL trajectories through a small set of low-order statistics,suggesting a compact foundation for OOD detection in complex environments.