Abstract
The rise of reinforcement learning (RL) in critical real-world applicationsdemands a fundamental rethinking of privacy in AI systems. Traditional privacyframeworks, designed to protect isolated data points, fall short for sequentialdecision-making systems where sensitive information emerges from temporalpatterns, behavioral strategies, and collaborative dynamics. Modern RLparadigms, such as federated RL (FedRL) and RL with human feedback (RLHF) inlarge language models (LLMs), exacerbate these challenges by introducingcomplex, interactive, and context-dependent learning environments thattraditional methods do not address. In this position paper, we argue for a newprivacy paradigm built on four core principles: multi-scale protection,behavioral pattern protection, collaborative privacy preservation, andcontext-aware adaptation. These principles expose inherent tensions betweenprivacy, utility, and interpretability that must be navigated as RL systemsbecome more pervasive in high-stakes domains like healthcare, autonomousvehicles, and decision support systems powered by LLMs. To tackle thesechallenges, we call for the development of new theoretical frameworks,practical mechanisms, and rigorous evaluation methodologies that collectivelyenable effective privacy protection in sequential decision-making systems.