BLM$_1$: A Boundless Large Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning

  • 2025-10-28 07:58:39
  • Wentao Tan, Bowen Wang, Heng Zhi, Chenyu Liu, Zhe Li, Jian Liu, Zengrong Lin, Yukun Dai, Yipeng Chen, Wenjie Yang, Enci Xie, Hao Xue, Baixu Ji, Chen Xu, Zhibin Wang, Tianshi Wang, Lei Zhu, Heng Tao Shen
  • 0

Abstract

Multimodal large language models (MLLMs) have advanced vision-languagereasoning and are increasingly deployed in embodied agents. However,significant limitations remain: MLLMs generalize poorly across digital-physicalspaces and embodiments; vision-language-action models (VLAs) produce low-levelactions yet lack robust high-level embodied reasoning; and most embodied largelanguage models (ELLMs) are constrained to digital-space with poorgeneralization to the physical world. Thus, unified models that operateseamlessly across digital and physical spaces while generalizing acrossembodiments and tasks remain absent. We introduce the \textbf{Boundless LargeModel (BLM$_1$)}, a multimodal spatial foundation model that preservesinstruction following and reasoning, incorporates embodied knowledge, andsupports robust cross-embodiment control. BLM$_1$ integrates three keycapabilities -- \textit{cross-space transfer, cross-task learning, andcross-embodiment generalization} -- via a two-stage training paradigm. Stage Iinjects embodied knowledge into the MLLM through curated digital corpora whilemaintaining language competence. Stage II trains a policy module through anintent-bridging interface that extracts high-level semantics from the MLLM toguide control, without fine-tuning the MLLM backbone. This process is supportedby a self-collected cross-embodiment demonstration suite spanning four robotembodiments and six progressively challenging tasks. Evaluations across digitaland physical benchmarks show that a single BLM$_1$ instance outperforms fourmodel families -- MLLMs, ELLMs, VLAs, and GMLMs -- achieving$\sim\!\textbf{6%}$ gains in digital tasks and $\sim\!\textbf{3%}$ in physicaltasks.

 

Quick Read (beta)

loading the full paper ...