caskcsg/LongMagpie_64k_dataset
收藏Hugging Face2025-08-02 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/caskcsg/LongMagpie_64k_dataset
下载链接
链接失效反馈官方服务:
资源简介:
LongMagpie数据集是一个自动生成的长上下文指令数据集,用于训练长上下文大型语言模型。该数据集由LongMagpie框架自动生成,无需人工标注。数据集包含从fineweb-edu数据集中提取的上下文、LongMagpie生成的查询和答案。数据集分为单文档和多个文档版本,并提供了用于训练长文本和平衡长文本和短文本性能的版本。
The LongMagpie dataset is an automatically generated long-context instruction dataset for training long-context large language models. The dataset is synthesized by the LongMagpie framework without human annotation. It includes contexts extracted from the fineweb-edu dataset, queries generated by LongMagpie, and answers. The dataset is available in single-document and multi-document versions, and provides versions for training long-text and balancing long-text and short-text performance.
提供机构:
caskcsg



