LongWriter-6k
收藏魔搭社区2026-05-15 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/ZhipuAI/LongWriter-6k
下载链接
链接失效反馈官方服务:
资源简介:
# LongWriter-6k
<p align="center">
🤗 <a href="https://huggingface.co/datasets/THUDM/LongWriter-6k" target="_blank">[LongWriter Dataset] </a> • 💻 <a href="https://github.com/THUDM/LongWriter" target="_blank">[Github Repo]</a> • 📃 <a href="https://arxiv.org/abs/2408.07055" target="_blank">[LongWriter Paper]</a>
</p>
**LongWriter-6k** dataset contains 6,000 SFT data with ultra-long output ranging from 2k-32k words in length (both English and Chinese). The data can support training LLMs to extend their maximum output window size to 10,000+ words.
## All Models
We open-sourced the following list of models trained on LongWriter-6k:
|Model|Huggingface Repo|Description|
|---|---|---|
|**LongWriter-glm4-9b**| [🤗 Huggingface Repo](https://huggingface.co/THUDM/LongWriter-glm4-9b) | **GLM-4-9B** with an extended 10k+ word output context window |
|**LongWriter-llama3.1-8b**| [🤗 Huggingface Repo](https://huggingface.co/THUDM/LongWriter-llama3.1-8b) | **Llama-3.1-8B** with an extended 10k+ word output context window |
## Citation
If you find our work useful, please consider citing LongWriter:
```
@article{bai2024longwriter,
title={LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs},
author={Yushi Bai and Jiajie Zhang and Xin Lv and Linzhi Zheng and Siqi Zhu and Lei Hou and Yuxiao Dong and Jie Tang and Juanzi Li},
journal={arXiv preprint arXiv:2408.07055},
year={2024}
}
```
# LongWriter-6k 数据集
<p align="center">
🤗 <a href="https://huggingface.co/datasets/THUDM/LongWriter-6k" target="_blank">[LongWriter 数据集]</a> • 💻 <a href="https://github.com/THUDM/LongWriter" target="_blank">[Github 仓库]</a> • 📃 <a href="https://arxiv.org/abs/2408.07055" target="_blank">[LongWriter 论文]</a>
</p>
**LongWriter-6k** 数据集包含6000条监督微调(Supervised Fine-Tuning,SFT)数据,其输出内容超长,长度介于2000至32000词之间,涵盖英文与中文两种语言。该数据集可用于训练大语言模型(Large Language Model,LLM),将其最大输出上下文窗口扩展至10000词以上。
## 已开源模型
| 模型名称 | Huggingface 仓库地址 | 描述 |
|---|---|---|
| **LongWriter-glm4-9b** | [🤗 Huggingface 仓库](https://huggingface.co/THUDM/LongWriter-glm4-9b) | 基于**GLM-4-9B**,将输出上下文窗口扩展至10000词以上 |
| **LongWriter-llama3.1-8b** | [🤗 Huggingface 仓库](https://huggingface.co/THUDM/LongWriter-llama3.1-8b) | 基于**Llama-3.1-8B**,将输出上下文窗口扩展至10000词以上 |
## 引用方式
如果您认为本工作对您有所帮助,请引用LongWriter:
@article{bai2024longwriter,
title={LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs},
author={Yushi Bai and Jiajie Zhang and Xin Lv and Linzhi Zheng and Siqi Zhu and Lei Hou and Yuxiao Dong and Jie Tang and Juanzi Li},
journal={arXiv preprint arXiv:2408.07055},
year={2024}
}
提供机构:
maas
创建时间:
2024-08-15



