five

ali5341/scitldr-chat-format

收藏
Hugging Face2026-04-30 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/ali5341/scitldr-chat-format
下载链接
链接失效反馈
官方服务:
资源简介:
SciTLDR Chat-Format数据集是一个用于摘要SFT的聊天格式准备数据集。它基于`allenai/scitldr`数据集,专注于科学论文的极端摘要(TLDR生成)。数据集包含训练和验证文件,支持多种目标策略(如`target-policy first`和`target-policy all`)。每个JSONL行包含`messages`(用户指令、论文标题和内容,以及助理的TLDR摘要句子)和`meta`(分割、来源变体、论文ID、目标索引/计数)。数据集的目标是生成一句话的科学TLDR摘要,用户输入由论文`标题`和`来源`构建,助理目标来自`target`。

The SciTLDR Chat-Format dataset is a chat-format preparation of SciTLDR for summarization SFT. It is based on the `allenai/scitldr` dataset and focuses on extreme summarization of scientific papers (TLDR generation). The dataset includes training and validation files and supports various target policies (e.g., `target-policy first` and `target-policy all`). Each JSONL row contains `messages` (user instruction, paper title and content, and assistants TLDR summary sentence) and `meta` (split, source variant, paper_id, target index/count). The datasets task is one-sentence scientific TLDR generation, with user input built from paper `title` and `source`, and assistant target drawn from `target`.
提供机构:
ali5341
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作