Abstract
Efficiently learning and executing long-horizon mobile manipulation (MoMa)tasks is crucial for advancing robotics in household and workplace settings.However, current MoMa models are data-inefficient, underscoring the need forimproved models that require realistic-sized benchmarks to evaluate theirefficiency, which do not exist. To address this, we introduce the LAMBDA({\lambda}) benchmark (Long-horizon Actions for Mobile-manipulationBenchmarking of Directed Activities), which evaluates the data efficiency ofmodels on language-conditioned, long-horizon, multi-room, multi-floor,pick-and-place tasks using a dataset of manageable size, more feasible forcollection. The benchmark includes 571 human-collected demonstrations thatprovide realism and diversity in simulated and real-world settings. Unlikeplanner-generated data, these trajectories offer natural variability andreplay-verifiability, ensuring robust learning and evaluation. We benchmarkseveral models, including learning-based models and a neuro-symbolic modularapproach combining foundation models with task and motion planning.Learning-based models show suboptimal success rates, even when leveragingpretrained weights, underscoring significant data inefficiencies. However, theneuro-symbolic approach performs significantly better while being more dataefficient. Findings highlight the need for more data-efficient learning-basedMoMa approaches. {\lambda} addresses this gap by serving as a key benchmark forevaluating the data efficiency of those future models in handling householdrobotics tasks.