InternScience/SGI-Reasoning
收藏Hugging Face2025-12-30 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/InternScience/SGI-Reasoning
下载链接
链接失效反馈官方服务:
资源简介:
SGI-Bench数据集旨在通过科学家对齐的工作流程评估大型语言模型(LLMs)的科学通用智能(SGI)。该数据集涵盖10个学科,包含超过1,000个由专家策划的样本,灵感来源于《科学》杂志的125个重大问题。数据集包含多模态和推理任务,覆盖科学探究的多个方面,如深思熟虑、构思、行动和感知。数据集分为不同任务,并包含一个可定制的代理评估框架和多种指标。README还详细介绍了数据集的特征、分割、使用说明、排行榜结果和引用信息。
The SGI-Bench dataset is designed to evaluate the Scientific General Intelligence (SGI) of large language models (LLMs) through a scientist-aligned workflow. It spans 10 disciplines and includes over 1,000 expert-curated samples inspired by Sciences 125 Big Questions. The dataset features multimodal and reasoning tasks, covering various aspects of scientific inquiry such as deliberation, conception, action, and perception. It is structured into different tasks and includes an agentic evaluation framework with customizable metrics. The README also provides details on the datasets features, splits, usage instructions, leaderboard results, and citation information.
提供机构:
InternScience



