Reasoning and the Trusting Behavior of DeepSeek and GPT: An Experiment Revealing Hidden Fault Lines in Large Language Models

Abstract

When encountering increasingly frequent performance improvements or costreductions from a new large language model (LLM), developers of applicationsleveraging LLMs must decide whether to take advantage of these improvements orstay with older tried-and-tested models. Low perceived switching frictions canlead to choices that do not consider more subtle behavior changes that thetransition may induce. Our experiments use a popular game-theoretic behavioraleconomics model of trust to show stark differences in the trusting behavior ofOpenAI's and DeepSeek's models. We highlight a collapse in the economic trustbehavior of the o1-mini and o3-mini models as they reconcile profit-maximizingand risk-seeking with future returns from trust, and contrast it withDeepSeek's more sophisticated and profitable trusting behavior that stems froman ability to incorporate deeper concepts like forward planning andtheory-of-mind. As LLMs form the basis for high-stakes commercial systems, ourresults highlight the perils of relying on LLM performance benchmarks that aretoo narrowly defined and suggest that careful analysis of their hidden faultlines should be part of any organization's AI strategy.

Quick Read (beta)

loading the full paper ...