Benchmarking LLMs in Web API Integration Tasks

  • 2025-11-03 16:12:09
  • Daniel Maninger, Leon Chemnitz, Amir Molzam Sharifloo, Jannis Brugger, Mira Mezini
  • 0

Abstract

API integration is a cornerstone of our digital infrastructure, enablingsoftware systems to connect and interact. However, as shown by many studies,writing or generating correct code to invoke APIs, particularly web APIs, ischallenging. Although large language models (LLMs) have become popular insoftware development, their effectiveness in automating the generation of webAPI integration code remains unexplored. In order to address this, we presentWAPIIBench, a dataset and evaluation pipeline designed to assess the ability ofLLMs to generate web API invocation code. Our experiments with severalopen-source LLMs reveal that generating API invocations poses a significantchallenge, resulting in hallucinated endpoints, incorrect argument usage, andother errors. None of the evaluated open-source models was able to solve morethan 40% of the tasks.

 

Quick Read (beta)

loading the full paper ...