SciCUEval: A Comprehensive Dataset for Evaluating Scientific Context Understanding in Large Language Models
收藏DataCite Commons2026-04-08 更新2026-04-25 收录
下载链接:
https://springernature.figshare.com/articles/dataset/SciCUEval_A_Comprehensive_Dataset_for_Evaluating_Scientific_Context_Understanding_in_Large_Language_Models/29924687
下载链接
链接失效反馈官方服务:
资源简介:
SciCUEval is a benchmark designed to evaluate context understanding capabilities of Large Language Models (LLMs) in scientific domains. It addresses the lack of specialized benchmarks for scientific domains by providing:
10 Diverse Datasets: Spanning biology, chemistry, physics, biomedicine, and materials science.
Multiple Data Modalities: Structured tables, knowledge graphs, and unstructured text.
Four Core Competencies: Relevant information identification, Information-absence detection, Multi-source information integration, and Context-aware inference
Comprehensive Evaluation: Assessing state-of-the-art LLMs across various scientific tasks.
SciCUEval 是一款用于评估大语言模型(Large Language Models,LLMs)在科学领域上下文理解能力的基准测试集。针对当前科学领域专用基准测试集匮乏的现状,该测试集提供了以下资源:
1. 十大多样化数据集:覆盖生物学、化学、物理学、生物医学与材料科学等学科领域。
2. 多模态数据类型:涵盖结构化表格、知识图谱及非结构化文本。
3. 四项核心考核能力:相关信息识别、信息缺失检测、多源信息融合以及上下文感知推理。
4. 全面性能评估:对当前顶尖大语言模型在各类科学任务中的表现开展测评。
提供机构:
figshare
创建时间:
2025-08-16



