Long Data Collections
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/togethercomputer/Long-Data-Collections
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一组用于在不超过32k个令牌的序列上训练模型的长期上下文数据集的集合。此外,部分数据集也被用于提高针对长上下文的指令调整效果。所涉及的任务是指令调整。
This dataset collection comprises a set of long-context datasets tailored for training models on sequences with no more than 32k tokens. Additionally, a portion of these datasets is utilized to improve the performance of long-context instruction tuning. The core task involved here is instruction tuning.



