ScaleAI/SciPredict

Name: ScaleAI/SciPredict
Creator: ScaleAI
Published: 2026-01-15 18:53:59
License: 暂无描述

Hugging Face2026-01-15 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/ScaleAI/SciPredict

下载链接

链接失效反馈

官方服务：

资源简介：

SciPredict是一个基准测试数据集，用于评估AI系统在物理、生物和化学领域预测实验结果的能力。该数据集包含405个问题，这些问题来源于2025年3月后发表的实证研究，覆盖了33个子领域。数据集结构包括问题数量（405个问题，5,716行包含模型响应）、领域分布（物理9个子领域，化学10个子领域，生物14个子领域）和问题格式（多选题、自由格式、数值题）。关键字段包括科学领域、具体领域、问题格式、论文标题、论文URL、出版日期、实验设置描述、测量内容、预测任务问题、真实答案、专家整理的背景知识和相关论文信息。数据集的主要发现包括模型准确率在14-26%之间（与人类专家约20%的准确率相比）、模型校准问题、背景知识对性能的提升作用以及问题格式对性能的影响。

SciPredict is a benchmark evaluating whether AI systems can predict experimental outcomes in physics, biology, and chemistry. The dataset comprises 405 questions derived from recently published empirical studies (post-March 2025), spanning 33 subdomains. The dataset structure includes total questions (405 questions, 5,716 rows including model responses), domains (Physics with 9 subdomains, Chemistry with 10 subdomains, Biology with 14 subdomains), and question formats (Multiple-choice, Free-format, Numerical). Key fields include DOMAIN, FIELD, PQ_FORMAT, TITLE, URL, PUBLISHING_DATE, EXPERIMENTAL_SETUP, MEASUREMENT_TAKEN, OUTCOME_PREDICTION_QUESTION, GTA, BACKGROUND_KNOWLEDGE, and RELATED_PAPERS_DATA. Key findings include model accuracy (14-26% vs. ~20% human expert accuracy), poor calibration, the benefit of background knowledge in improving performance, and the impact of question format on performance.

提供机构：

ScaleAI

5,000+

优质数据集

54 个

任务类型

进入经典数据集