five

xbench/ScienceQA

收藏
Hugging Face2025-06-18 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/xbench/ScienceQA
下载链接
链接失效反馈
官方服务:
资源简介:
xbench是一个持续更新、无污染、真实世界的、特定领域的人工智能评估框架。它旨在通过两个互补的赛道来衡量AI系统的智能前沿和实际应用效用:AGI Tracking赛道衡量模型的核心能力,如推理、工具使用和记忆;而Professional Aligned赛道则是一类新的评估,基于工作流程、环境和业务KPI,与领域专家共同设计。数据集开源了ScienceQA和DeepSearch两个AGI Tracking基准的源数据和评估代码。

xbench is an evergreen, contamination-free, real-world, domain-specific AI evaluation framework designed to measure both the intelligence frontier and real-world utility of AI systems. It features two complementary tracks: AGI Tracking, which measures core model capabilities like reasoning, tool-use, and memory, and Profession Aligned, a new class of evals grounded in workflows, environments, and business KPIs, co-designed with domain experts. The dataset includes source data and evaluation code for two AGI Tracking benchmarks: ScienceQA and DeepSearch.
提供机构:
xbench
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作