Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations

  • 2025-07-30 13:52:22
  • Supawich Sitdhipol, Waritwong Sukprasongdee, Ekapol Chuangsuwanich, Rina Tse
  • 0

Abstract

Fusing information from human observations can help robots overcome sensinglimitations in collaborative tasks. However, an uncertainty-aware fusionframework requires a grounded likelihood representing the uncertainty of humaninputs. This paper presents a Feature Pyramid Likelihood Grounding Network(FP-LGN) that grounds spatial language by learning relevant map image featuresand their relationships with spatial relation semantics. The model is trainedas a probability estimator to capture aleatoric uncertainty in human languageusing three-stage curriculum learning. Results showed that FP-LGN matchedexpert-designed rules in mean Negative Log-Likelihood (NLL) and demonstratedgreater robustness with lower standard deviation. Collaborative sensing resultsdemonstrated that the grounded likelihood successfully enableduncertainty-aware fusion of heterogeneous human language observations and robotsensor measurements, achieving significant improvements in human-robotcollaborative task performance.

 

Quick Read (beta)

loading the full paper ...