When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents

Abstract

Although Large Language Model (LLM)-based agents are increasingly used infinancial trading, it remains unclear whether they can reason and adapt in livemarkets, as most studies test models instead of agents, cover limited periodsand assets, and rely on unverified data. To address these gaps, we introduceAgent Market Arena (AMA), the first lifelong, real-time benchmark forevaluating LLM-based trading agents across multiple markets. AMA integratesverified trading data, expert-checked news, and diverse agent architectureswithin a unified trading framework, enabling fair and continuous comparisonunder real conditions. It implements four agents, including InvestorAgent as asingle-agent baseline, TradeAgent and HedgeFundAgent with different riskstyles, and DeepFundAgent with memory-based reasoning, and evaluates themacross GPT-4o, GPT-4.1, Claude-3.5-haiku, Claude-sonnet-4, andGemini-2.0-flash. Live experiments on both cryptocurrency and stock marketsdemonstrate that agent frameworks display markedly distinct behavioralpatterns, spanning from aggressive risk-taking to conservative decision-making,whereas model backbones contribute less to outcome variation. AMA thusestablishes a foundation for rigorous, reproducible, and continuously evolvingevaluation of financial reasoning and trading intelligence in LLM-based agents.

Quick Read (beta)

loading the full paper ...