Abstract
Large Language Models (LLMs) exhibit considerable promise in financialapplications; however, prevailing models frequently demonstrate limitationswhen confronted with scenarios that necessitate sophisticated reasoningcapabilities, stringent trustworthiness criteria, and efficient adaptation todomain-specific requirements. We introduce the Agentar-Fin-R1 series offinancial large language models (8B and 32B parameters), specificallyengineered based on the Qwen3 foundation model to enhance reasoningcapabilities, reliability, and domain specialization for financialapplications. Our optimization approach integrates a high-quality, systematicfinancial task label system with a comprehensive multi-layered trustworthinessassurance framework. This framework encompasses high-quality trustworthyknowledge engineering, multi-agent trustworthy data synthesis, and rigorousdata validation governance. Through label-guided automated difficulty-awareoptimization, tow-stage training pipeline, and dynamic attribution systems, weachieve substantial improvements in training efficiency. Our models undergocomprehensive evaluation on mainstream financial benchmarks including Fineva,FinEval, and FinanceIQ, as well as general reasoning datasets such as MATH-500and GPQA-diamond. To thoroughly assess real-world deployment capabilities, weinnovatively propose the Finova evaluation benchmark, which focuses onagent-level financial reasoning and compliance verification. Experimentalresults demonstrate that Agentar-Fin-R1 not only achieves state-of-the-artperformance on financial tasks but also exhibits exceptional general reasoningcapabilities, validating its effectiveness as a trustworthy solution forhigh-stakes financial applications. The Finova bench is available athttps://github.com/antgroup/Finova.