Abstract
Automating activities through robots in unstructured environments, such asconstruction sites, has been a long-standing desire. However, the high degreeof unpredictable events in these settings has resulted in far less adoptioncompared to more structured settings, such as manufacturing, where robots canbe hard-coded or trained on narrowly defined datasets. Recently, pretrainedfoundation models, such as Large Language Models (LLMs), have demonstratedsuperior generalization capabilities by providing zero-shot solutions forproblems do not present in the training data, proposing them as a potentialsolution for introducing robots to unstructured environments. To this end, thisstudy investigates potential opportunities and challenges of pretrainedfoundation models from a multi-dimensional perspective. The studysystematically reviews application of foundation models in two field of roboticand unstructured environment and then synthesized them with deliberative actingtheory. Findings showed that linguistic capabilities of LLMs have been utilizedmore than other features for improving perception in human-robot interactions.On the other hand, findings showed that the use of LLMs demonstrated moreapplications in project management and safety in construction, and naturalhazard detection in disaster management. Synthesizing these findings, welocated the current state-of-the-art in this field on a five-level scale ofautomation, placing them at conditional automation. This assessment was thenused to envision future scenarios, challenges, and solutions toward autonomoussafe unstructured environments. Our study can be seen as a benchmark to trackour progress toward that future.