Abstract
Sequential decision-making in high-dimensional continuous action spaces,particularly in stochastic environments, faces significant computationalchallenges. We explore this challenge in the traditional offline RL setting,where an agent must learn how to make decisions based on data collected througha stochastic behavior policy. We present Latent Macro Action Planner (L-MAP),which addresses this challenge by learning a set of temporally extendedmacro-actions through a state-conditional Vector Quantized VariationalAutoencoder (VQ-VAE), effectively reducing action dimensionality. L-MAP employsa (separate) learned prior model that acts as a latent transition model andallows efficient sampling of plausible actions. During planning, our approachaccounts for stochasticity in both the environment and the behavior policy byusing Monte Carlo tree search (MCTS). In offline RL settings, includingstochastic continuous control tasks, L-MAP efficiently searches over discretelatent actions to yield high expected returns. Empirical results demonstratethat L-MAP maintains low decision latency despite increased actiondimensionality. Notably, across tasks ranging from continuous control withinherently stochastic dynamics to high-dimensional robotic hand manipulation,L-MAP significantly outperforms existing model-based methods and performson-par with strong model-free actor-critic baselines, highlighting theeffectiveness of the proposed approach in planning in complex and stochasticenvironments with high-dimensional action spaces.