ChaoticNeutrals/Thudm-Long_Writer-4.4k-ShareGPT
收藏Hugging Face2024-11-13 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/ChaoticNeutrals/Thudm-Long_Writer-4.4k-ShareGPT
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
language:
- en
---
Orginal Dataset from: https://huggingface.co/datasets/THUDM/LongWriter-6k
Converted, deslopped, refusals removed, grammar corrected, min-hash deduplicated using: https://github.com/The-Chaotic-Neutrals/ShareGPT-Formaxxing
@article{bai2024longwriter,
title={LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs},
author={Yushi Bai and Jiajie Zhang and Xin Lv and Linzhi Zheng and Siqi Zhu and Lei Hou and Yuxiao Dong and Jie Tang and Juanzi Li},
journal={arXiv preprint arXiv:2408.07055},
year={2024}
}
---
许可证:其他
语言:
- 英语
---
原始数据集来源:https://huggingface.co/datasets/THUDM/LongWriter-6k
本数据集经格式转换、数据清洗、拒答内容移除、语法修正,并通过最小哈希(min-hash)算法完成去重,所用处理工具为:https://github.com/The-Chaotic-Neutrals/ShareGPT-Formaxxing
@article{bai2024longwriter,
title={LongWriter:解锁长上下文大语言模型(Large Language Model,LLM)的万字级文本生成能力},
author={白宇石、张家杰、吕鑫、郑林芝、朱思琪、侯磊、董玉霄、唐杰、李娟子},
journal={arXiv预印本 arXiv:2408.07055},
year={2024}
}
提供机构:
ChaoticNeutrals



