Abstract
Recent developments in foundation models, like Large Language Models (LLMs)and Vision-Language Models (VLMs), trained on extensive data, facilitateflexible application across different tasks and modalities. Their impact spansvarious fields, including healthcare, education, and robotics. This paperprovides an overview of the practical application of foundation models inreal-world robotics, with a primary emphasis on the replacement of specificcomponents within existing robot systems. The summary encompasses theperspective of input-output relationships in foundation models, as well astheir role in perception, motion planning, and control within the field ofrobotics. This paper concludes with a discussion of future challenges andimplications for practical robot applications.