Abstract
Large Language Models (LLMs) such as GPT-4, trained on huge amount ofdatasets spanning multiple domains, exhibit significant reasoning,understanding, and planning capabilities across various tasks. This studypresents the first-ever work in Arabic language integration within theVision-and-Language Navigation (VLN) domain in robotics, an area that has beennotably underexplored in existing research. We perform a comprehensiveevaluation of state-of-the-art multi-lingual Small Language Models (SLMs),including GPT-4o mini, Llama 3 8B, and Phi-3 medium 14B, alongside theArabic-centric LLM, Jais. Our approach utilizes the NavGPT framework, a pureLLM-based instruction-following navigation agent, to assess the impact oflanguage on navigation reasoning through zero-shot sequential action predictionusing the R2R dataset. Through comprehensive experiments, we demonstrate thatour framework is capable of high-level planning for navigation tasks whenprovided with instructions in both English and Arabic. However, certain modelsstruggled with reasoning and planning in the Arabic language due to inherentlimitations in their capabilities, sub-optimal performance, and parsing issues.These findings highlight the importance of enhancing planning and reasoningcapabilities in language models for effective navigation, emphasizing this as akey area for further development while also unlocking the potential ofArabic-language models for impactful real-world applications.