Abstract
We analyze how well pre-trained large language models (e.g., Llama2, GPT-4,Claude 3, etc) can do linear and non-linear regression when given in-contextexamples, without any additional training or gradient updates. Our findingsreveal that several large language models (e.g., GPT-4, Claude 3) are able toperform regression tasks with a performance rivaling (or even outperforming)that of traditional supervised methods such as Random Forest, Bagging, orGradient Boosting. For example, on the challenging Friedman #2 regressiondataset, Claude 3 outperforms many supervised methods such as AdaBoost, SVM,Random Forest, KNN, or Gradient Boosting. We then investigate how well theperformance of large language models scales with the number of in-contextexemplars. We borrow from the notion of regret from online learning andempirically show that LLMs are capable of obtaining a sub-linear regret.