Abstract
We introduce the Stochastic Correlated Obstacle Scene (SCOS) problem, anavigation setting with spatially correlated obstacles of uncertain blockagestatus, realistically constrained sensors that provide noisy readings andcostly disambiguation. Modeling the spatial correlation with Gaussian RandomField (GRF), we develop Bayesian belief updates that refine blockageprobabilities, and use the posteriors to reduce search space for efficiency. Tofind the optimal traversal policy, we propose a novel two-stage learningframework. An offline phase learns a robust base policy via optimistic policyiteration augmented with information bonus to encourage exploration ininformative regions, followed by an online rollout policy with periodic baseupdates via a Bayesian mechanism for information adaptation. This frameworksupports both Monte Carlo point estimation and distributional reinforcementlearning (RL) to learn full cost distributions, leading to stronger uncertaintyquantification. We establish theoretical benefits of correlation-aware updatingand convergence property under posterior sampling. Comprehensive empiricalevaluations across varying obstacle densities, sensor capabilities demonstrateconsistent performance gains over baselines. This framework addressesnavigation challenges in environments with adversarial interruptions orclustered natural hazards.