moss-002-sft-data
收藏魔搭社区2025-12-05 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/openmoss/moss-002-sft-data
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for "moss-002-sft-data"
## Dataset Description
- **Homepage:** [https://txsun1997.github.io/blogs/moss.html](https://txsun1997.github.io/blogs/moss.html)
- **Repository:** [https://github.com/OpenLMLab/MOSS](https://github.com/OpenLMLab/MOSS)
- **Total amount of disk used:** 2.16 GB
### Dataset Summary
An open-source conversational dataset that was used to train MOSS-002. The user prompts are extended based on a small set of human-written seed prompts in a way similar to [Self-Instruct](https://arxiv.org/abs/2212.10560). The AI responses are generated using `text-davinci-003`. The user prompts of `en_harmlessness` are from [Anthropic red teaming data](https://github.com/anthropics/hh-rlhf/tree/master/red-team-attempts).
### Data Splits
| name | \# samples |
|----------------------|-----------:|
| en_helpfulness.json | 419049 |
| en_honesty.json | 112580 |
| en_harmlessness.json | 38873 |
| zh_helpfulness.json | 447750 |
| zh_honesty.json | 142885 |
# "moss-002-sft-data"数据集卡片
## 数据集说明
- **主页:** [https://txsun1997.github.io/blogs/moss.html](https://txsun1997.github.io/blogs/moss.html)
- **代码仓库:** [https://github.com/OpenLMLab/MOSS](https://github.com/OpenLMLab/MOSS)
- **占用磁盘总量:** 2.16 GB
### 数据集概览
本数据集为用于训练MOSS-002的开源对话数据集。用户提示(prompt)基于少量人工撰写的种子提示(seed prompt)进行扩展,扩展方式与[Self-Instruct](https://arxiv.org/abs/2212.10560)研究中的方法一致。AI回复由`text-davinci-003`生成。`en_harmlessness`子集的用户提示源自[Anthropic红队测试数据(Anthropic red teaming data)](https://github.com/anthropics/hh-rlhf/tree/master/red-team-attempts)。
### 数据划分
| 数据集文件名 | 样本数量 |
|----------------------|---------:|
| en_helpfulness.json | 419049 |
| en_honesty.json | 112580 |
| en_harmlessness.json | 38873 |
| zh_helpfulness.json | 447750 |
| zh_honesty.json | 142885 |
提供机构:
maas
创建时间:
2025-10-23



