Unveiling Provider Bias in Large Language Models for Code Generation

Abstract

Large Language Models (LLMs) have emerged as the new recommendation engines,outperforming traditional methods in both capability and scope, particularly incode generation applications. Our research reveals a novel provider bias inLLMs, namely without explicit input prompts, these models show systematicpreferences for services from specific providers in their recommendations(e.g., favoring Google Cloud over Microsoft Azure). This bias holds significantimplications for market dynamics and societal equilibrium, potentiallypromoting digital monopolies. It may also deceive users and violate theirexpectations, leading to various consequences. This paper presents the firstcomprehensive empirical study of provider bias in LLM code generation. Wedevelop a systematic methodology encompassing an automated pipeline for datasetgeneration, incorporating 6 distinct coding task categories and 30 real-worldapplication scenarios. Our analysis encompasses over 600,000 LLM-generatedresponses across seven state-of-the-art models, utilizing approximately 500million tokens (equivalent to \$5,000+ in computational costs). The studyevaluates both the generated code snippets and their embedded service providerselections to quantify provider bias. Additionally, we conduct a comparativeanalysis of seven debiasing prompting techniques to assess their efficacy inmitigating these biases. Our findings demonstrate that LLMs exhibit significantprovider preferences, predominantly favoring services from Google and Amazon,and can autonomously modify input code to incorporate their preferred providerswithout users' requests. Notably, we observe discrepancies between providersrecommended in conversational contexts versus those implemented in generatedcode. The complete dataset and analysis results are available in ourrepository.

Quick Read (beta)

loading the full paper ...