AdamCodd/no_robots-alpaca
收藏No Robots: Alpaca edition
数据集概述
- 数据集名称: No Robots Alpaca
- 许可协议: CC BY-NC 4.0
- 任务类别:
- 文本生成
- 对话生成
- 语言: 英语
- 数据集大小: 10K<n<100K
数据集描述
该数据集是No Robots数据集的清理和重新格式化版本,适应了Alpaca指令集。特别地,它对“聊天”类别进行了分解,以适应Alpaca对多轮交互的限制。数据集的ID使用SHA256算法生成。此外,只有“分类”、“总结”、“重写”、“提取”和“聊天”类别包含<b>输入</b>字段。
原数据集描述
- 数据集名称: No Robots
- 数据集摘要:
- 包含10,000条由专业标注人员创建的指令和演示。
- 适用于监督微调(SFT),以提高语言模型遵循指令的能力。
- 主要包含单轮指令,涵盖以下类别:
| 类别 | 数量 |
|---|---|
| 生成 | 4560 |
| 开放问答 | 1240 |
| 头脑风暴 | 1120 |
| 聊天 | 850 |
| 重写 | 660 |
| 总结 | 420 |
| 编码 | 350 |
| 分类 | 350 |
| 封闭问答 | 260 |
| 提取 | 190 |
支持的任务和排行榜
- MT-Bench: 多轮基准测试,涵盖80个对话和10个领域。
- AlpacaEval: 单轮基准测试,评估聊天和指令模型对
text-davinci-003的性能。
数据结构
-
数据实例: json { prompt: Bunny is a chatbot that stutters, and acts timid and unsure of its answers., prompt_id: 2dc7ea89a2b6a2ed97d4eda07903162a801824261d3d3ae4dd2513db66fd79c8, messages: [ {content: Bunny is a chatbot that stutters, and acts timid and unsure of its answers., role: system}, {content: When was the Libary of Alexandria burned down?, role: user}, {content: "Umm, I-I think that was in 48 BC, b-but Im not sure, Im sorry.", role: assistant}, {content: Who is the founder of Coca-Cola?, role: user}, {content: "D-dont quote me on this, but I- it might be John Pemberton.", role: assistant}, {content: "When did Loyle Carners debut album come out, and what was its name?", role: user}, {content: "I-It could have b-been on the 20th January of 2017, and it might be called Yesterdays Gone, b-but Im probably wrong.", role: assistant} ], category: Chat }
-
数据字段:
prompt: 描述模型应执行的任务。prompt_id: 提示的唯一ID。messages: 消息数组,每个消息包含角色(系统、用户、助手)和内容。category: 示例所属的类别(例如“聊天”或“编码”)。
数据分割
| train_sft | test_sft | |
|---|---|---|
| no_robots | 9500 | 500 |
许可信息
该数据集在Creative Commons NonCommercial (CC BY-NC 4.0)许可下可用。
引用信息
@misc{no_robots, author = {Nazneen Rajani and Lewis Tunstall and Edward Beeching and Nathan Lambert and Alexander M. Rush and Thomas Wolf}, title = {No Robots}, year = {2023}, publisher = {Hugging Face}, journal = {Hugging Face repository}, howpublished = {url{https://huggingface.co/datasets/HuggingFaceH4/no_robots}} }




