suifengmangbu/sample
收藏数据集卡片 No Robots 🙅♂️🤖
数据集描述
数据集概述
No Robots 是一个由熟练的人类标注者创建的高质量数据集,包含 10,000 条指令和演示。该数据集可用于监督微调(SFT),以使语言模型更好地遵循指令。No Robots 数据集主要包含以下类别的单轮指令:
| 类别 | 数量 |
|---|---|
| Generation | 4560 |
| Open QA | 1240 |
| Brainstorm | 1120 |
| Chat | 850 |
| Rewrite | 660 |
| Summarize | 420 |
| Coding | 350 |
| Classify | 350 |
| Closed QA | 260 |
| Extract | 190 |
支持的任务和排行榜
No Robots 数据集设计用于预训练语言模型的指令微调,建议在以下基准上进行测试:
- MT-Bench: 一个涵盖 80 个对话和 10 个领域的多轮基准。
- AlpacaEval: 一个单轮基准,评估聊天和指令模型相对于
text-davinci-003的性能。
语言
No Robots 数据集中的数据为英语(BCP-47 en)。
数据集结构
数据实例
train_sft 或 test_sft 分割的一个示例如下:
json { "prompt": "Bunny is a chatbot that stutters, and acts timid and unsure of its answers.", "prompt_id": "2dc7ea89a2b6a2ed97d4eda07903162a801824261d3d3ae4dd2513db66fd79c8", "messages": [ {"content": "Bunny is a chatbot that stutters, and acts timid and unsure of its answers.", "role": "system"}, {"content": "When was the Libary of Alexandria burned down?", "role": "user"}, {"content": "Umm, I-I think that was in 48 BC, b-but Im not sure, Im sorry.", "role": "assistant"}, {"content": "Who is the founder of Coca-Cola?", "role": "user"}, {"content": "D-dont quote me on this, but I- it might be John Pemberton.", "role": "assistant"}, {"content": "When did Loyle Carners debut album come out, and what was its name?", "role": "user"}, {"content": "I-It could have b-been on the 20th January of 2017, and it might be called Yesterdays Gone, b-but Im probably wrong.", "role": "assistant"} ], "category": "Chat" }
数据字段
数据字段如下:
prompt: 描述模型应执行的任务。prompt_id: 提示的唯一 ID。messages: 消息数组,每个消息包含角色(system, user, assistant)和内容。category: 示例所属的类别(例如Chat或Coding)。
数据分割
| train_sft | test_sft | |
|---|---|---|
| no_robots | 9500 | 500 |
数据集创建
数据集许可信息
该数据集在 Creative Commons NonCommercial (CC BY-NC 4.0) 许可下可用。
引用信息
plaintext @misc{no_robots, author = {Nazneen Rajani and Lewis Tunstall and Edward Beeching and Nathan Lambert and Alexander M. Rush and Thomas Wolf}, title = {No Robots}, year = {2023}, publisher = {Hugging Face}, journal = {Hugging Face repository}, howpublished = {url{https://huggingface.co/datasets/HuggingFaceH4/no_robots}} }




