Abstract
Effective reinforcement learning (RL) for sepsis treatment depends onlearning stable, clinically meaningful state representations from irregular ICUtime series. While previous works have explored representation learning forthis task, the critical challenge of training instability in sequentialrepresentations and its detrimental impact on policy performance has beenoverlooked. This work demonstrates that Controlled Differential Equations (CDE)state representation can achieve strong RL policies when two key factors aremet: (1) ensuring training stability through early stopping or stabilizationmethods, and (2) enforcing acuity-aware representations by correlationregularization with clinical scores (SOFA, SAPS-II, OASIS). Experiments on theMIMIC-III sepsis cohort reveal that stable CDE autoencoder producesrepresentations strongly correlated with acuity scores and enables RL policieswith superior performance (WIS return $> 0.9$). In contrast, unstable CDErepresentation leads to degraded representations and policy failure (WIS return$\sim$ 0). Visualizations of the latent space show that stable CDEs not onlyseparate survivor and non-survivor trajectories but also reveal clear acuityscore gradients, whereas unstable training fails to capture either pattern.These findings highlight practical guidelines for using CDEs to encodeirregular medical time series in clinical RL, emphasizing the need for trainingstability in sequential representation learning.