xbench/DeepSearch

Name: xbench/DeepSearch
Creator: xbench
Published: 2025-06-18 17:05:10
License: 暂无描述

Hugging Face2025-06-18 更新2025-07-05 收录

下载链接：

https://hf-mirror.com/datasets/xbench/DeepSearch

下载链接

链接失效反馈

官方服务：

资源简介：

xbench是一个持续更新、无污染的真实世界特定领域AI评估框架，旨在通过两个互补的赛道测量AI系统的智能前沿和实际应用效用。它包括AGI跟踪赛道，用于衡量模型的核心能力，如推理、工具使用和记忆；以及与领域专家共同设计的职业对齐赛道，基于工作流程、环境和商业KPIs。本文开源了ScienceQA和DeepSearch两个AGI Tracking基准的源数据和评估代码。

xbench is an evergreen, contamination-free, real-world, domain-specific AI evaluation framework designed to measure both the intelligence frontier and real-world utility of AI systems through two complementary tracks. It includes the AGI Tracking track for measuring core model capabilities like reasoning, tool-use, and memory, and the Profession Aligned track, co-designed with domain experts, based on workflows, environments, and business KPIs. The source data and evaluation code for two AGI Tracking benchmarks, ScienceQA and DeepSearch, are open-sourced in this paper.

提供机构：

xbench

搜集汇总

数据集介绍

背景与挑战

背景概述

xbench/DeepSearch是一个用于评估AI模型在搜索和信息检索场景中工具使用能力的中文数据集，采用csv格式，大小小于1K，包含多个模型的评估结果和准确率数据。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集