tulu-v2-sft-long-mixture
收藏魔搭社区2025-07-16 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/tulu-v2-sft-long-mixture
下载链接
链接失效反馈官方服务:
资源简介:
This is a recreation of the [tulu-v2-sft-mixture](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture), **without** splitting ShareGPT dataset into chunks of max 4096 tokens. This might be interesting to people who are doing long-context finetuning.
Please refer to the original tulu-v2-sft-mixture for the details of this dataset mixture.
### License
We are releasing this dataset under the terms of [ODC-BY](https://opendatacommons.org/licenses/by/1-0/). By using this, you are also bound by the [Common Crawl terms of use](https://commoncrawl.org/terms-of-use/) in respect of the content contained in the dataset.
本数据集为[tulu-v2-sft-mixture](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture)的复刻版本,**未**将ShareGPT数据集切分为最大4096 Token的片段。该数据集对于开展长上下文微调的研究人员颇具参考价值。
请查阅原始tulu-v2-sft-mixture数据集以了解该混合数据集的详细信息。
### 许可证
本数据集依据[ODC-BY](https://opendatacommons.org/licenses/by/1-0/)协议发布。使用本数据集时,您还需遵守[Common Crawl使用条款](https://commoncrawl.org/terms-of-use/)中与数据集所含内容相关的规定。
提供机构:
maas
创建时间:
2025-05-28



