Abstract
We report the development of GPT-4, a large-scale, multimodal model which canaccept image and text inputs and produce text outputs. While less capable thanhumans in many real-world scenarios, GPT-4 exhibits human-level performance onvarious professional and academic benchmarks, including passing a simulated barexam with a score around the top 10% of test takers. GPT-4 is aTransformer-based model pre-trained to predict the next token in a document.The post-training alignment process results in improved performance on measuresof factuality and adherence to desired behavior. A core component of thisproject was developing infrastructure and optimization methods that behavepredictably across a wide range of scales. This allowed us to accuratelypredict some aspects of GPT-4's performance based on models trained with nomore than 1/1,000th the compute of GPT-4.