ibm-research/ITBench-Lite
收藏Hugging Face2026-04-21 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/ibm-research/ITBench-Lite
下载链接
链接失效反馈官方服务:
资源简介:
ITBench-Lite是一个用于评估大型语言模型(LLMs)和AI代理在真实IT自动化任务上表现的基准数据集。该数据集由IBM Research开发,包含50个场景,覆盖两个关键领域:站点可靠性工程(SRE)和金融运营(FinOps)。SRE领域包含35个场景,提供环境快照用于事件诊断;FinOps领域包含15个场景,提供合成成本异常数据用于支出分析。数据集包含丰富的观察数据,如警报、指标、日志、跟踪等,以及Kubernetes事件和资源状态。此外,还详细描述了任务,如故障定位和成本异常分析,并提供了数据集的局限性和相关资源。
ITBench-Lite is a systematic framework for benchmarking LLMs and AI Agents on real-world IT automation tasks. This dataset contains 50 scenarios across two critical domains: Site Reliability Engineering (SRE) and Financial Operations (FinOps). SRE domain includes 35 scenarios with environment snapshots for incident diagnosis, while FinOps domain includes 15 scenarios with synthetic cost anomaly data for spend analysis. The dataset provides rich observability data such as alerts, metrics, logs, traces, and Kubernetes events and resource states. It also details tasks like fault localization and cost anomaly analysis, and outlines the datasets limitations and related resources.
提供机构:
ibm-research



