five

AgentSearch/AgentSearchBench-Responses

收藏
Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/AgentSearch/AgentSearchBench-Responses
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: id dtype: string - name: task dtype: string - name: agent_id dtype: string - name: response dtype: string - name: time dtype: string splits: - name: train num_bytes: 182653139 num_examples: 49040 download_size: 80475361 dataset_size: 182653139 configs: - config_name: default data_files: - split: train path: data/train-* --- # AgentSearchBench Responses **AgentSearchBench** is a large-scale benchmark for AI agent search, built from nearly 10,000 real-world agents sourced from the [GPT Store](https://chatgpt.com/gpts), [Google Cloud Marketplace](https://cloud.google.com/marketplace), and [AgentAI Platform](https://agent.ai/). 🌐 [Project Page](https://bingo-w.github.io/AgentSearchBench) • 💻 [Codebase](https://github.com/Bingo-W/AgentSearchBench) --- ## Overview This repository contains the **raw agent execution responses** collected during the construction of AgentSearchBench. Candidate agents were executed against each task in the validation set, and their outputs were evaluated by an LLM Judge to produce execution-grounded relevance labels. These responses are released to support reproducibility and to enable research into agent evaluation, output quality analysis, and judge calibration. --- ## Dataset Statistics | Split | Responses | |-------|-----------| | Validation | 60,000+ | Responses cover single-agent task queries from the validation set. --- ## Data Fields - `id`: Unique identifier for each response. - `task`: Task associated with the response. - `agent_id`: Identifier of the agent associated with the response. - `response`: Response content - `latency`: End-to-end latency --- ## Usage ```python from datasets import load_dataset ds = load_dataset("AgentSearch/AgentSearchBench-Responses") ``` --- ## Related Datasets | Dataset | Description | |---------|-------------| | [AgentSearchBench-Tasks](https://huggingface.co/datasets/AgentSearch/AgentSearchBench-Tasks) | Benchmark tasks: single-agent queries, multi-agent queries, and task descriptions | | [AgentSearchBench-Agents](https://huggingface.co/datasets/AgentSearch/AgentSearchBench-Agents) | The AgentBase dataset: 9,759 real-world AI agents with metadata | --- ## Citation ```bibtex @article{} ```

数据集信息: 特征: - 字段名:id 数据类型:字符串 - 字段名:task 数据类型:字符串 - 字段名:agent_id 数据类型:字符串 - 字段名:response 数据类型:字符串 - 字段名:time 数据类型:字符串 数据集划分: - 划分名称:train 占用字节数:182653139 样本数量:49040 下载大小:80475361 数据集总大小:182653139 配置项: - 配置名称:default 数据文件: - 划分:train 路径:data/train-* # AgentSearchBench 响应数据集 **AgentSearchBench** 是面向AI智能体(AI Agent)搜索的大规模基准测试集,其数据源涵盖来自[GPT商店](https://chatgpt.com/gpts)、[谷歌云市场](https://cloud.google.com/marketplace)以及[AgentAI平台](https://agent.ai/)的近10000个真实AI智能体。 🌐 [项目主页](https://bingo-w.github.io/AgentSearchBench) • 💻 [代码仓库](https://github.com/Bingo-W/AgentSearchBench) --- ## 数据集概述 本仓库包含AgentSearchBench构建过程中采集的**原始AI智能体执行响应**。候选智能体针对验证集中的每项任务执行推理,其输出由大语言模型评判器(LLM Judge)进行评估,以生成基于执行过程的相关性标签。 本数据集的发布旨在支持研究可复现性,并推动AI智能体评估、输出质量分析以及评判器校准相关研究。 --- ## 数据集统计数据 | 数据集划分 | 响应数量 | |-------|-----------| | 验证集 | 60000+ | 响应覆盖验证集中的单智能体任务查询。 --- ## 数据字段说明 - `id`:每条响应的唯一标识符。 - `task`:该响应对应的任务内容。 - `agent_id`:生成该响应的AI智能体标识符。 - `response`:AI智能体生成的响应文本。 - `latency`:端到端推理延迟。 --- ## 使用方法 python from datasets import load_dataset ds = load_dataset("AgentSearch/AgentSearchBench-Responses") --- ## 相关数据集 | 数据集名称 | 数据集描述 | |---------|-------------| | [AgentSearchBench-Tasks](https://huggingface.co/datasets/AgentSearch/AgentSearchBench-Tasks) | 基准测试任务集:包含单智能体查询、多智能体查询以及任务描述 | | [AgentSearchBench-Agents](https://huggingface.co/datasets/AgentSearch/AgentSearchBench-Agents) | AgentBase数据集:包含9759个带元数据的真实AI智能体 | --- ## 引用格式 bibtex @article{}
提供机构:
AgentSearch
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作