Abstract
Continuous-time reinforcement learning (CTRL) provides a natural frameworkfor sequential decision-making in dynamic environments where interactionsevolve continuously over time. While CTRL has shown growing empirical success,its ability to adapt to varying levels of problem difficulty remains poorlyunderstood. In this work, we investigate the instance-dependent behavior ofCTRL and introduce a simple, model-based algorithm built on maximum likelihoodestimation (MLE) with a general function approximator. Unlike existingapproaches that estimate system dynamics directly, our method estimates thestate marginal density to guide learning. We establish instance-dependentperformance guarantees by deriving a regret bound that scales with the totalreward variance and measurement resolution. Notably, the regret becomesindependent of the specific measurement strategy when the observation frequencyadapts appropriately to the problem's complexity. To further improveperformance, our algorithm incorporates a randomized measurement schedule thatenhances sample efficiency without increasing measurement cost. These resultshighlight a new direction for designing CTRL algorithms that automaticallyadjust their learning behavior based on the underlying difficulty of theenvironment.