AgentSearch/AgentSearchBench-Tasks

Name: AgentSearch/AgentSearchBench-Tasks
Creator: AgentSearch
Published: 2026-04-19 20:10:22
License: 暂无描述

Hugging Face2026-04-19 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/AgentSearch/AgentSearchBench-Tasks

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit language: - en tags: - agent - search - retrieval - reranking - benchmarking size_categories: - 1K<n<10K configs: - config_name: single-agent_task_query data_files: - split: validation path: single-agent_task_query/validation-00000-of-00001.parquet - split: test path: single-agent_task_query/test-00000-of-00001.parquet - config_name: multi-agent_task_query data_files: - split: validation path: multi-agent_task_query/validation-00000-of-00001.parquet - split: test path: multi-agent_task_query/test-00000-of-00001.parquet - config_name: task_description data_files: - split: validation path: task_description/validation-00000-of-00001.parquet - split: test path: task_description/test-00000-of-00001.parquet --- # AgentSearchBench Tasks **AgentSearchBench** is a large-scale benchmark for AI agent search, built from nearly 10,000 real-world agents sourced from the [GPT Store](https://chatgpt.com/gpts), [Google Cloud Marketplace](https://cloud.google.com/marketplace), and [AgentAI Platform](https://agent.ai/). 🌐 [Project Page](https://bingo-w.github.io/AgentSearchBench) • 💻 [Codebase](https://github.com/Bingo-W/AgentSearchBench) --- ## Overview This repository contains the **benchmark tasks** for AgentSearchBench. Agent search is framed as both a retrieval and reranking problem, where relevance is grounded in real execution performance rather than textual similarity alone. Tasks are generated by: 1. Creating concrete, executable queries from agent documentation. 2. Grouping and abstracting these into broader high-level task descriptions. Agent relevance is assessed by executing candidate agents on each task and evaluating outputs via an LLM Judge, with human alignment validation. --- ## Dataset Statistics | Split | Total | Task Description | Single-Agent Task Query | Multi-Agent Task Query | |------------|-------|-----------------|------------------------|------------------------| | Validation | 3,211 | 259 | 2,452 | 500 | | Test | 798 | 65 | 633 | 100 | --- ## Configurations This dataset contains three configurations, each representing a different query type: ### `single-agent_task_query` Concrete, executable task queries designed to be solved by a **single agent**. Queries are derived directly from agent documentation. ### `multi-agent_task_query` Executable task queries that require the **combination of multiple agents** to complete the task. ### `task_description` Higher-level, abstract task descriptions obtained by grouping and abstracting single-agent task queries. Useful for evaluating agent search under more realistic, open-ended user intents. --- ## Data Fields - `id`: Unique identifier for each task. - `task`: Task content. - `labels`: Binary retrieval labels. - `ranking_labels`: Graded ranking labels. - `ref_agents`: Reference agents used to generate the task. - `ref_subtasks`: Associated subtasks (multi-agent task query and task description). - `rubric`: Subtask selection rubric (task description only). --- ## Usage ```python from datasets import load_dataset # Single-agent task queries ds = load_dataset("AgentSearch/AgentSearchBench-Tasks", "single-agent_task_query") # Multi-agent task queries ds = load_dataset("AgentSearch/AgentSearchBench-Tasks", "multi-agent_task_query") # High-level task descriptions ds = load_dataset("AgentSearch/AgentSearchBench-Tasks", "task_description") ``` --- ## Related Datasets | Dataset | Description | |---------|-------------| | [AgentSearchBench-Agents](https://huggingface.co/datasets/AgentSearch/AgentSearchBench-Agents) | The AgentBase dataset: 9,759 real-world AI agents with metadata | | [AgentSearchBench-Responses](https://huggingface.co/datasets/AgentSearch/AgentSearchBench-Responses) | 60K+ raw agent execution responses from the validation set | --- ## Citation ```bibtex @article{} ```

提供机构：

AgentSearch

5,000+

优质数据集

54 个

任务类型

进入经典数据集