tldc/SciRIFF
收藏Hugging Face2025-12-19 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/tldc/SciRIFF
下载链接
链接失效反馈官方服务:
资源简介:
SciRIFF数据集包含13.7万个指令跟随演示,用于54个科学文献理解任务。这些任务涵盖五个核心科学文献类别和五个领域。数据集分为三种配置,最大上下文长度分别为4096、8192和16384。每个实例包含输入、输出、唯一实例ID和元数据(包括任务家族、领域、输入上下文、来源类型和输出上下文等子字段)。数据集采用ODC-By许可,通过重新利用现有科学文献理解数据集创建。README还详细列出了每个SciRIFF任务的来源数据,包括许可信息和原始数据集链接。
The SciRIFF dataset includes 137K instruction-following demonstrations for 54 scientific literature understanding tasks. The tasks cover five essential scientific literature categories and span five domains. There are three dataset configurations with different max context lengths: 4096, 8192, and 16384. Each instance has fields such as input, output, _instance_id, and metadata with subfields like task_family, domains, input_context, source_type, and output_context. The dataset is licensed under ODC-By and was created by repurposing existing scientific literature understanding datasets. The README also provides information on the source data for each SciRIFF task, including license details and links to original datasets.
提供机构:
tldc



