five

LongWriter-6k

收藏
魔搭社区2026-05-15 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/ZhipuAI/LongWriter-6k
下载链接
链接失效反馈
官方服务:
资源简介:
# LongWriter-6k <p align="center"> 🤗 <a href="https://huggingface.co/datasets/THUDM/LongWriter-6k" target="_blank">[LongWriter Dataset] </a> • 💻 <a href="https://github.com/THUDM/LongWriter" target="_blank">[Github Repo]</a> • 📃 <a href="https://arxiv.org/abs/2408.07055" target="_blank">[LongWriter Paper]</a> </p> **LongWriter-6k** dataset contains 6,000 SFT data with ultra-long output ranging from 2k-32k words in length (both English and Chinese). The data can support training LLMs to extend their maximum output window size to 10,000+ words. ## All Models We open-sourced the following list of models trained on LongWriter-6k: |Model|Huggingface Repo|Description| |---|---|---| |**LongWriter-glm4-9b**| [🤗 Huggingface Repo](https://huggingface.co/THUDM/LongWriter-glm4-9b) | **GLM-4-9B** with an extended 10k+ word output context window | |**LongWriter-llama3.1-8b**| [🤗 Huggingface Repo](https://huggingface.co/THUDM/LongWriter-llama3.1-8b) | **Llama-3.1-8B** with an extended 10k+ word output context window | ## Citation If you find our work useful, please consider citing LongWriter: ``` @article{bai2024longwriter, title={LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs}, author={Yushi Bai and Jiajie Zhang and Xin Lv and Linzhi Zheng and Siqi Zhu and Lei Hou and Yuxiao Dong and Jie Tang and Juanzi Li}, journal={arXiv preprint arXiv:2408.07055}, year={2024} } ```

# LongWriter-6k 数据集 <p align="center"> 🤗 <a href="https://huggingface.co/datasets/THUDM/LongWriter-6k" target="_blank">[LongWriter 数据集]</a> • 💻 <a href="https://github.com/THUDM/LongWriter" target="_blank">[Github 仓库]</a> • 📃 <a href="https://arxiv.org/abs/2408.07055" target="_blank">[LongWriter 论文]</a> </p> **LongWriter-6k** 数据集包含6000条监督微调(Supervised Fine-Tuning,SFT)数据,其输出内容超长,长度介于2000至32000词之间,涵盖英文与中文两种语言。该数据集可用于训练大语言模型(Large Language Model,LLM),将其最大输出上下文窗口扩展至10000词以上。 ## 已开源模型 | 模型名称 | Huggingface 仓库地址 | 描述 | |---|---|---| | **LongWriter-glm4-9b** | [🤗 Huggingface 仓库](https://huggingface.co/THUDM/LongWriter-glm4-9b) | 基于**GLM-4-9B**,将输出上下文窗口扩展至10000词以上 | | **LongWriter-llama3.1-8b** | [🤗 Huggingface 仓库](https://huggingface.co/THUDM/LongWriter-llama3.1-8b) | 基于**Llama-3.1-8B**,将输出上下文窗口扩展至10000词以上 | ## 引用方式 如果您认为本工作对您有所帮助,请引用LongWriter: @article{bai2024longwriter, title={LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs}, author={Yushi Bai and Jiajie Zhang and Xin Lv and Linzhi Zheng and Siqi Zhu and Lei Hou and Yuxiao Dong and Jie Tang and Juanzi Li}, journal={arXiv preprint arXiv:2408.07055}, year={2024} }
提供机构:
maas
创建时间:
2024-08-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作