xbench/ScienceQA

Name: xbench/ScienceQA
Creator: xbench
Published: 2025-06-18 17:06:35
License: 暂无描述

Hugging Face2025-06-18 更新2025-07-05 收录

下载链接：

https://hf-mirror.com/datasets/xbench/ScienceQA

下载链接

链接失效反馈

官方服务：

资源简介：

xbench是一个持续更新、无污染、真实世界的、特定领域的人工智能评估框架。它旨在通过两个互补的赛道来衡量AI系统的智能前沿和实际应用效用：AGI Tracking赛道衡量模型的核心能力，如推理、工具使用和记忆；而Professional Aligned赛道则是一类新的评估，基于工作流程、环境和业务KPI，与领域专家共同设计。数据集开源了ScienceQA和DeepSearch两个AGI Tracking基准的源数据和评估代码。

xbench is an evergreen, contamination-free, real-world, domain-specific AI evaluation framework designed to measure both the intelligence frontier and real-world utility of AI systems. It features two complementary tracks: AGI Tracking, which measures core model capabilities like reasoning, tool-use, and memory, and Profession Aligned, a new class of evals grounded in workflows, environments, and business KPIs, co-designed with domain experts. The dataset includes source data and evaluation code for two AGI Tracking benchmarks: ScienceQA and DeepSearch.

提供机构：

xbench

5,000+

优质数据集

54 个

任务类型

进入经典数据集