ali5341/scitldr-chat-format
收藏Hugging Face2026-04-30 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/ali5341/scitldr-chat-format
下载链接
链接失效反馈官方服务:
资源简介:
SciTLDR Chat-Format数据集是一个用于摘要SFT的聊天格式准备数据集。它基于`allenai/scitldr`数据集,专注于科学论文的极端摘要(TLDR生成)。数据集包含训练和验证文件,支持多种目标策略(如`target-policy first`和`target-policy all`)。每个JSONL行包含`messages`(用户指令、论文标题和内容,以及助理的TLDR摘要句子)和`meta`(分割、来源变体、论文ID、目标索引/计数)。数据集的目标是生成一句话的科学TLDR摘要,用户输入由论文`标题`和`来源`构建,助理目标来自`target`。
The SciTLDR Chat-Format dataset is a chat-format preparation of SciTLDR for summarization SFT. It is based on the `allenai/scitldr` dataset and focuses on extreme summarization of scientific papers (TLDR generation). The dataset includes training and validation files and supports various target policies (e.g., `target-policy first` and `target-policy all`). Each JSONL row contains `messages` (user instruction, paper title and content, and assistants TLDR summary sentence) and `meta` (split, source variant, paper_id, target index/count). The datasets task is one-sentence scientific TLDR generation, with user input built from paper `title` and `source`, and assistant target drawn from `target`.
提供机构:
ali5341



