five

THUDM/LongWriter-6k

收藏
Hugging Face2024-08-14 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/THUDM/LongWriter-6k
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - text-generation language: - en - zh tags: - Long Context - sft - writing size_categories: - 1K<n<10K license: apache-2.0 --- # LongWriter-6k <p align="center"> 🤗 <a href="https://huggingface.co/datasets/THUDM/LongWriter-6k" target="_blank">[LongWriter Dataset] </a> • 💻 <a href="https://github.com/THUDM/LongWriter" target="_blank">[Github Repo]</a> • 📃 <a href="https://arxiv.org/abs/2408.07055" target="_blank">[LongWriter Paper]</a> </p> **LongWriter-6k** dataset contains 6,000 SFT data with ultra-long output ranging from 2k-32k words in length (both English and Chinese). The data can support training LLMs to extend their maximum output window size to 10,000+ words. ## All Models We open-sourced the following list of models trained on LongWriter-6k: |Model|Huggingface Repo|Description| |---|---|---| |**LongWriter-glm4-9b**| [🤗 Huggingface Repo](https://huggingface.co/THUDM/LongWriter-glm4-9b) | **GLM-4-9B** with an extended 10k+ word output context window | |**LongWriter-llama3.1-8b**| [🤗 Huggingface Repo](https://huggingface.co/THUDM/LongWriter-llama3.1-8b) | **Llama-3.1-8B** with an extended 10k+ word output context window | ## Citation If you find our work useful, please consider citing LongWriter: ``` @article{bai2024longwriter, title={LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs}, author={Yushi Bai and Jiajie Zhang and Xin Lv and Linzhi Zheng and Siqi Zhu and Lei Hou and Yuxiao Dong and Jie Tang and Juanzi Li}, journal={arXiv preprint arXiv:2408.07055}, year={2024} } ```
提供机构:
THUDM
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作