five

Replication Data for: SOCBench-SC: Automatic Benchmarking of LLM-Based Service Compositions

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/replication-data-socbench-sc-automatic-benchmarking-llm-based-service-compositions
下载链接
链接失效反馈
官方服务:
资源简介:
Automated service composition integrates independent (Web) services into complex workflows. Classical approaches relied on state machines or AI planning, whereas modern service documentation uses semi-structured OpenAPI specifications that combine natural language with structured elements. Large Language Models (LLMs) show strong potential due to their advanced semantic understanding of such specifications, yet existing benchmarks address only service discovery and lack systematic methods to evaluate generated code. We contribute SOCBench-SC, the first static code analysis framework for systematically evaluating LLM-generated service compositions. Unlike manual or dynamic analysis, SOCBench-SC provides automated, scalable, and reproducible assessment of invoked endpoints, reducing manual effort and avoiding runtime errors. We implement a prototype for Python code using reaching definition analysis and apply it to the two public benchmarks SOCBench-D and RestBench. These combine natural language tasks with expected solution endpoints. For each benchmark case, we generate the service composition code using six state-of-the-art LLMs, which is then analyzed by SOCBench-SC. Correctness is assessed using an LLM judge with manual checks. Our results show that SOCBench-SC reliably identifies invoked endpoints and enables comparative LLM ranking. Larger LLMs achieve \u224880% F1 average endpoint correctness against benchmark solutions, while revealing systematic deficits in Retrieval-Augmented Generation (RAG) and LLM agents. These findings confirm SOCBench-SC as a promising automation framework for extending service discovery benchmarks to service compositions and a foundation for extensions such as workflow ordering and reliability.
提供机构:
Robin D. Pesl; Marco Aiello; Massimo Mecella; Jerin G. Mathew
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作