AgentSearch/AgentSearchBench-Responses

Name: AgentSearch/AgentSearchBench-Responses
Creator: AgentSearch
Published: 2026-04-19 20:07:52
License: 暂无描述

Hugging Face2026-04-19 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/AgentSearch/AgentSearchBench-Responses

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: id dtype: string - name: task dtype: string - name: agent_id dtype: string - name: response dtype: string - name: time dtype: string splits: - name: train num_bytes: 182653139 num_examples: 49040 download_size: 80475361 dataset_size: 182653139 configs: - config_name: default data_files: - split: train path: data/train-* --- # AgentSearchBench Responses **AgentSearchBench** is a large-scale benchmark for AI agent search, built from nearly 10,000 real-world agents sourced from the [GPT Store](https://chatgpt.com/gpts), [Google Cloud Marketplace](https://cloud.google.com/marketplace), and [AgentAI Platform](https://agent.ai/). 🌐 [Project Page](https://bingo-w.github.io/AgentSearchBench) • 💻 [Codebase](https://github.com/Bingo-W/AgentSearchBench) --- ## Overview This repository contains the **raw agent execution responses** collected during the construction of AgentSearchBench. Candidate agents were executed against each task in the validation set, and their outputs were evaluated by an LLM Judge to produce execution-grounded relevance labels. These responses are released to support reproducibility and to enable research into agent evaluation, output quality analysis, and judge calibration. --- ## Dataset Statistics | Split | Responses | |-------|-----------| | Validation | 60,000+ | Responses cover single-agent task queries from the validation set. --- ## Data Fields - `id`: Unique identifier for each response. - `task`: Task associated with the response. - `agent_id`: Identifier of the agent associated with the response. - `response`: Response content - `latency`: End-to-end latency --- ## Usage ```python from datasets import load_dataset ds = load_dataset("AgentSearch/AgentSearchBench-Responses") ``` --- ## Related Datasets | Dataset | Description | |---------|-------------| | [AgentSearchBench-Tasks](https://huggingface.co/datasets/AgentSearch/AgentSearchBench-Tasks) | Benchmark tasks: single-agent queries, multi-agent queries, and task descriptions | | [AgentSearchBench-Agents](https://huggingface.co/datasets/AgentSearch/AgentSearchBench-Agents) | The AgentBase dataset: 9,759 real-world AI agents with metadata | --- ## Citation ```bibtex @article{} ```

数据集信息：特征： - 字段名：id 数据类型：字符串 - 字段名：task 数据类型：字符串 - 字段名：agent_id 数据类型：字符串 - 字段名：response 数据类型：字符串 - 字段名：time 数据类型：字符串数据集划分： - 划分名称：train 占用字节数：182653139 样本数量：49040 下载大小：80475361 数据集总大小：182653139 配置项： - 配置名称：default 数据文件： - 划分：train 路径：data/train-* # AgentSearchBench 响应数据集 **AgentSearchBench** 是面向AI智能体（AI Agent）搜索的大规模基准测试集，其数据源涵盖来自[GPT商店](https://chatgpt.com/gpts)、[谷歌云市场](https://cloud.google.com/marketplace)以及[AgentAI平台](https://agent.ai/)的近10000个真实AI智能体。 🌐 [项目主页](https://bingo-w.github.io/AgentSearchBench) • 💻 [代码仓库](https://github.com/Bingo-W/AgentSearchBench) --- ## 数据集概述本仓库包含AgentSearchBench构建过程中采集的**原始AI智能体执行响应**。候选智能体针对验证集中的每项任务执行推理，其输出由大语言模型评判器（LLM Judge）进行评估，以生成基于执行过程的相关性标签。本数据集的发布旨在支持研究可复现性，并推动AI智能体评估、输出质量分析以及评判器校准相关研究。 --- ## 数据集统计数据 | 数据集划分 | 响应数量 | |-------|-----------| | 验证集 | 60000+ | 响应覆盖验证集中的单智能体任务查询。 --- ## 数据字段说明 - `id`：每条响应的唯一标识符。 - `task`：该响应对应的任务内容。 - `agent_id`：生成该响应的AI智能体标识符。 - `response`：AI智能体生成的响应文本。 - `latency`：端到端推理延迟。 --- ## 使用方法 python from datasets import load_dataset ds = load_dataset("AgentSearch/AgentSearchBench-Responses") --- ## 相关数据集 | 数据集名称 | 数据集描述 | |---------|-------------| | [AgentSearchBench-Tasks](https://huggingface.co/datasets/AgentSearch/AgentSearchBench-Tasks) | 基准测试任务集：包含单智能体查询、多智能体查询以及任务描述 | | [AgentSearchBench-Agents](https://huggingface.co/datasets/AgentSearch/AgentSearchBench-Agents) | AgentBase数据集：包含9759个带元数据的真实AI智能体 | --- ## 引用格式 bibtex @article{}

提供机构：

AgentSearch

5,000+

优质数据集

54 个

任务类型

进入经典数据集