Abstract
In this report, we pose the following question: Who is the most intelligentAI model to date, as measured by the OlympicArena (an Olympic-level,multi-discipline, multi-modal benchmark for superintelligent AI)? Wespecifically focus on the most recently released models: Claude-3.5-Sonnet,Gemini-1.5-Pro, and GPT-4o. For the first time, we propose using an Olympicmedal Table approach to rank AI models based on their comprehensive performanceacross various disciplines. Empirical results reveal: (1) Claude-3.5-Sonnetshows highly competitive overall performance over GPT-4o, even surpassingGPT-4o on a few subjects (i.e., Physics, Chemistry, and Biology). (2)Gemini-1.5-Pro and GPT-4V are ranked consecutively just behind GPT-4o andClaude-3.5-Sonnet, but with a clear performance gap between them. (3) Theperformance of AI models from the open-source community significantly lagsbehind these proprietary models. (4) The performance of these models on thisbenchmark has been less than satisfactory, indicating that we still have a longway to go before achieving superintelligence. We remain committed tocontinuously tracking and evaluating the performance of the latest powerfulmodels on this benchmark (available athttps://github.com/GAIR-NLP/OlympicArena).