AgentSearch/AgentSearchBench-Responses
收藏Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/AgentSearch/AgentSearchBench-Responses
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: string
- name: task
dtype: string
- name: agent_id
dtype: string
- name: response
dtype: string
- name: time
dtype: string
splits:
- name: train
num_bytes: 182653139
num_examples: 49040
download_size: 80475361
dataset_size: 182653139
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# AgentSearchBench Responses
**AgentSearchBench** is a large-scale benchmark for AI agent search, built from nearly 10,000 real-world agents sourced from the [GPT Store](https://chatgpt.com/gpts), [Google Cloud Marketplace](https://cloud.google.com/marketplace), and [AgentAI Platform](https://agent.ai/).
🌐 [Project Page](https://bingo-w.github.io/AgentSearchBench) • 💻 [Codebase](https://github.com/Bingo-W/AgentSearchBench)
---
## Overview
This repository contains the **raw agent execution responses** collected during the construction of AgentSearchBench. Candidate agents were executed against each task in the validation set, and their outputs were evaluated by an LLM Judge to produce execution-grounded relevance labels.
These responses are released to support reproducibility and to enable research into agent evaluation, output quality analysis, and judge calibration.
---
## Dataset Statistics
| Split | Responses |
|-------|-----------|
| Validation | 60,000+ |
Responses cover single-agent task queries from the validation set.
---
## Data Fields
- `id`: Unique identifier for each response.
- `task`: Task associated with the response.
- `agent_id`: Identifier of the agent associated with the response.
- `response`: Response content
- `latency`: End-to-end latency
---
## Usage
```python
from datasets import load_dataset
ds = load_dataset("AgentSearch/AgentSearchBench-Responses")
```
---
## Related Datasets
| Dataset | Description |
|---------|-------------|
| [AgentSearchBench-Tasks](https://huggingface.co/datasets/AgentSearch/AgentSearchBench-Tasks) | Benchmark tasks: single-agent queries, multi-agent queries, and task descriptions |
| [AgentSearchBench-Agents](https://huggingface.co/datasets/AgentSearch/AgentSearchBench-Agents) | The AgentBase dataset: 9,759 real-world AI agents with metadata |
---
## Citation
```bibtex
@article{}
```
数据集信息:
特征:
- 字段名:id
数据类型:字符串
- 字段名:task
数据类型:字符串
- 字段名:agent_id
数据类型:字符串
- 字段名:response
数据类型:字符串
- 字段名:time
数据类型:字符串
数据集划分:
- 划分名称:train
占用字节数:182653139
样本数量:49040
下载大小:80475361
数据集总大小:182653139
配置项:
- 配置名称:default
数据文件:
- 划分:train
路径:data/train-*
# AgentSearchBench 响应数据集
**AgentSearchBench** 是面向AI智能体(AI Agent)搜索的大规模基准测试集,其数据源涵盖来自[GPT商店](https://chatgpt.com/gpts)、[谷歌云市场](https://cloud.google.com/marketplace)以及[AgentAI平台](https://agent.ai/)的近10000个真实AI智能体。
🌐 [项目主页](https://bingo-w.github.io/AgentSearchBench) • 💻 [代码仓库](https://github.com/Bingo-W/AgentSearchBench)
---
## 数据集概述
本仓库包含AgentSearchBench构建过程中采集的**原始AI智能体执行响应**。候选智能体针对验证集中的每项任务执行推理,其输出由大语言模型评判器(LLM Judge)进行评估,以生成基于执行过程的相关性标签。
本数据集的发布旨在支持研究可复现性,并推动AI智能体评估、输出质量分析以及评判器校准相关研究。
---
## 数据集统计数据
| 数据集划分 | 响应数量 |
|-------|-----------|
| 验证集 | 60000+ |
响应覆盖验证集中的单智能体任务查询。
---
## 数据字段说明
- `id`:每条响应的唯一标识符。
- `task`:该响应对应的任务内容。
- `agent_id`:生成该响应的AI智能体标识符。
- `response`:AI智能体生成的响应文本。
- `latency`:端到端推理延迟。
---
## 使用方法
python
from datasets import load_dataset
ds = load_dataset("AgentSearch/AgentSearchBench-Responses")
---
## 相关数据集
| 数据集名称 | 数据集描述 |
|---------|-------------|
| [AgentSearchBench-Tasks](https://huggingface.co/datasets/AgentSearch/AgentSearchBench-Tasks) | 基准测试任务集:包含单智能体查询、多智能体查询以及任务描述 |
| [AgentSearchBench-Agents](https://huggingface.co/datasets/AgentSearch/AgentSearchBench-Agents) | AgentBase数据集:包含9759个带元数据的真实AI智能体 |
---
## 引用格式
bibtex
@article{}
提供机构:
AgentSearch



