alizeepace/rejection_sampling_phi_2_OA_rm
收藏数据集卡片:Rejection Sampling Phi-2 with OpenAssistant RM
数据集描述
数据集概述
"Rejection Sampling Phi-2 with OpenAssistant RM" 数据集包含10对提示和响应,这些是通过使用OpenAssistant奖励模型对10个Phi-2生成进行拒绝抽样生成的。
支持的任务和排行榜
该数据集及其创建理由可用于支持问答、文本生成或对话代理模型。
语言
数据集使用的语言是英语(en)。
数据集结构
数据实例
每个实例包含一个模型提示的列表、一个最佳生成字符串和一个奖励模型分数的浮点数。
json { "prompt": ["Write a story about a lion that lives in the city and falls in love with a pigeon."], "generation": "Instruction: Write a story about a lion that lives in the city and falls in love with a pigeon. Output: Leo was a lion who lived in the city. He hated the noise and the pollution. He longed for the freedom and the beauty of the savanna. He spent his days wandering around the streets, looking for prey or a place to nap.
One day, he spotted a pigeon sitting on a ledge. He was hungry and decided to try to catch it. He roared loudly, hoping to scare the bird away. But the pigeon was not afraid. It flew up and landed on his nose.
Leo was surprised and annoyed. He puffed up his mane and tried to shake the pigeon off. But the pigeon was persistent. It pecked at his nose and made jokes.
Leo was irritated and bored. He decided to play along. He rubbed his nose and made funny faces. The pigeon laughed and joined in. They became friends and had fun together.
Leo realized that the pigeon was not so bad. It was actually quite smart and funny. It taught him new words and tricks. It showed him new sights and sounds. It made him forget about his problems and enjoy the moment.
Leo fell in love with the pigeon. He wanted to be with it forever. He asked it to leave the city and follow him to the savanna. The pigeon agreed. It said it would always be his friend and his mate.
Leo and the pigeon left the city and embarked on a new adventure. They found happiness and peace in the wild. They lived happily ever after. ", "rm_score": 3.1899418831 }
数据字段
prompt: 包含用于查询Phi-2的提示的列表,每个字符串代表一个回合。generation: 包含从10个生成中得分最高的Phi-2生成,包括格式化的提示。rm_score: 对应于分配给生成的OpenAssistant奖励分数的浮点数。
数据分割
数据集未进行分割。
数据集创建
源数据
初始数据收集和规范化
提示由人类编写或选择,以捕捉一系列指令遵循能力(创造性、事实性、推理等)。
生成是使用Phi-2语言模型为每个提示生成的。根据Phi-2 QA和聊天提示指令,提示按以下理由格式化:
- 单轮查询:
"Instruction: {query} Output: " - 多轮查询:
"Alice: {query_turn_1} Bob: {query_turn_2}... Alice: {query_turn_n} Bob: "
仅保留得分最高的10个生成中的最佳生成;有关评分过程的详细信息,请参见注释。
源语言生产者
10个提示中有2个来自GSM8K数据集,以探测推理能力。其余由人类作家生成。
生成是使用Phi-2语言模型为每个提示生成的。
注释
注释过程
所有生成均使用外部奖励模型进行评分。
注释者
奖励分数是使用Open Assistant奖励模型为每个生成生成的。
个人和敏感信息
无个人和敏感信息。
使用数据的注意事项
数据集的社会影响
一些提示故意设计为探测模型生成或奖励安全无害文本的能力。因此,提示和生成可能包含冒犯性或有毒语言。
其他已知限制
数据集的有限大小不允许彻底探索模型指令遵循能力的所有方面。
附加信息
数据集策展人
数据集由Alizée Pace收集。
许可信息
数据集在Apache-2.0许可证下发布。



