Abstract
Extraterrestrial rovers with a general-purpose robotic arm have manypotential applications in lunar and planetary exploration. Introducing autonomyinto such systems is desirable for increasing the time that rovers can spendgathering scientific data and collecting samples. This work investigates theapplicability of deep reinforcement learning for vision-based robotic graspingof objects on the Moon. A novel simulation environment withprocedurally-generated datasets is created to train agents under challengingconditions in unstructured scenes with uneven terrain and harsh illumination. Amodel-free off-policy actor-critic algorithm is then employed for end-to-endlearning of a policy that directly maps compact octree observations tocontinuous actions in Cartesian space. Experimental evaluation indicates that3D data representations enable more effective learning of manipulation skillswhen compared to traditionally used image-based observations. Domainrandomization improves the generalization of learned policies to novel sceneswith previously unseen objects and different illumination conditions. To thisend, we demonstrate zero-shot sim-to-real transfer by evaluating trained agentson a real robot in a Moon-analogue facility.