SciCUEval: A Comprehensive Dataset for Evaluating Scientific Context Understanding in Large Language Models

Name: SciCUEval: A Comprehensive Dataset for Evaluating Scientific Context Understanding in Large Language Models
Creator: figshare
Published: 2026-04-08 07:01:53
License: 暂无描述

DataCite Commons2026-04-08 更新2026-04-25 收录

下载链接：

https://springernature.figshare.com/articles/dataset/SciCUEval_A_Comprehensive_Dataset_for_Evaluating_Scientific_Context_Understanding_in_Large_Language_Models/29924687

下载链接

链接失效反馈

官方服务：

资源简介：

SciCUEval is a benchmark designed to evaluate context understanding capabilities of Large Language Models (LLMs) in scientific domains. It addresses the lack of specialized benchmarks for scientific domains by providing: 10 Diverse Datasets: Spanning biology, chemistry, physics, biomedicine, and materials science. Multiple Data Modalities: Structured tables, knowledge graphs, and unstructured text. Four Core Competencies: Relevant information identification, Information-absence detection, Multi-source information integration, and Context-aware inference Comprehensive Evaluation: Assessing state-of-the-art LLMs across various scientific tasks.

SciCUEval 是一款用于评估大语言模型（Large Language Models，LLMs）在科学领域上下文理解能力的基准测试集。针对当前科学领域专用基准测试集匮乏的现状，该测试集提供了以下资源： 1. 十大多样化数据集：覆盖生物学、化学、物理学、生物医学与材料科学等学科领域。 2. 多模态数据类型：涵盖结构化表格、知识图谱及非结构化文本。 3. 四项核心考核能力：相关信息识别、信息缺失检测、多源信息融合以及上下文感知推理。 4. 全面性能评估：对当前顶尖大语言模型在各类科学任务中的表现开展测评。

提供机构：

figshare

创建时间：

2025-08-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集