Abstract
Distributionally Robust Reinforcement Learning (DR-RL) aims to derive apolicy optimizing the worst-case performance within a predefined uncertaintyset. Despite extensive research, previous DR-RL algorithms have predominantlyfavored model-based approaches, with limited availability of model-free methodsoffering convergence guarantees or sample complexities. This paper proposes amodel-free DR-RL algorithm leveraging the Multi-level Monte Carlo (MLMC)technique to close such a gap. Our innovative approach integrates a thresholdmechanism that ensures finite sample requirements for algorithmicimplementation, a significant improvement than previous model-free algorithms.We develop algorithms for uncertainty sets defined by total variation,Chi-square divergence, and KL divergence, and provide finite sample analysesunder all three cases. Remarkably, our algorithms represent the firstmodel-free DR-RL approach featuring finite sample complexity for totalvariation and Chi-square divergence uncertainty sets, while also offering animproved sample complexity and broader applicability compared to existingmodel-free DR-RL algorithms for the KL divergence model. The complexities ofour method establish the tightest results for all three uncertainty models inmodel-free DR-RL, underscoring the effectiveness and efficiency of ouralgorithm, and highlighting its potential for practical applications.