five

bijode8658-kudimi/sft-5

收藏
Hugging Face2025-12-13 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/bijode8658-kudimi/sft-5
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含多个配置,涉及文档的分块处理、摘要生成以及从文档中生成的单次和多跳问题。数据集特征包括文档ID、文本内容、文件名、元数据(如文件大小)、文档摘要、摘要模型、文本块(包括块ID和块文本)、多跳文本块(包括块ID列表和块文本列表)、问题(包括问题文本、附加指令、自答、估计难度、自评问题类型、生成模型、思考过程、原始响应、引用、原始问题、问题重写模型、问题重写理由、原始问题重写响应、源块ID等)、以及用于轻量评估的准备数据(包括问题、附加指令、真实答案、黄金标准、选择、问题类别、类型、估计难度、引用、文档ID、块ID、问题生成模型、块、文档、文档摘要等)。数据集分为训练集,每个配置的训练集大小和示例数量不同。

The dataset includes multiple configurations involving chunked processing of documents, summary generation, and single-shot and multi-hop questions generated from the documents. Dataset features include document ID, text content, file name, metadata (such as file size), document summary, summarization model, chunks (including chunk ID and chunk text), multihop chunks (including list of chunk IDs and list of chunk texts), questions (including question text, additional instructions, self-answer, estimated difficulty, self-assessed question type, generating model, thought process, raw response, citations, original question, question rewriting model, question rewriting rationale, raw question rewriting response, source chunk IDs, etc.), and prepared data for light evaluation (including question, additional instructions, ground truth answer, gold standard, choices, question category, kind, estimated difficulty, citations, document ID, chunk IDs, question generating model, chunks, document, document summary, etc.). The dataset is divided into training sets, with varying sizes and numbers of examples for each configuration.
提供机构:
bijode8658-kudimi
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作