five

fuzhou-jiang/moss-002-sft-data

收藏
Hugging Face2026-01-06 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/fuzhou-jiang/moss-002-sft-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - conversational - text-generation language: - en - zh size_categories: - 1M<n<10M --- # Dataset Card for "moss-002-sft-data" ## Dataset Description - **Homepage:** [https://txsun1997.github.io/blogs/moss.html](https://txsun1997.github.io/blogs/moss.html) - **Repository:** [https://github.com/OpenLMLab/MOSS](https://github.com/OpenLMLab/MOSS) - **Total amount of disk used:** 2.16 GB ### Dataset Summary An open-source conversational dataset that was used to train MOSS-002. The user prompts are extended based on a small set of human-written seed prompts in a way similar to [Self-Instruct](https://arxiv.org/abs/2212.10560). The AI responses are generated using `text-davinci-003`. The user prompts of `en_harmlessness` are from [Anthropic red teaming data](https://github.com/anthropics/hh-rlhf/tree/master/red-team-attempts). ### Data Splits | name | \# samples | |----------------------|-----------:| | en_helpfulness.json | 419049 | | en_honesty.json | 112580 | | en_harmlessness.json | 38873 | | zh_helpfulness.json | 447750 | | zh_honesty.json | 142885 |
提供机构:
fuzhou-jiang
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作