tulu-2.5-prompts
收藏魔搭社区2025-08-08 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/tulu-2.5-prompts
下载链接
链接失效反馈官方服务:
资源简介:
# Tulu 2.5 Prompts Dataset
This dataset contains the set of prompts used to train the PPO models described in [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
This contains only the prompts used during the PPO training.
## Dataset Details
The description of each prompt goes as follows:
- gsm8k_prompts: Prompts taken from the [GSM8k train split](https://huggingface.co/datasets/openai/gsm8k).
- ultrafeedback_prompts: The prompts from the [cleaned UltraFeedback](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned) dataset.
- math_prompts: Prompts mined from [UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback), [WildChat](https://huggingface.co/datasets/allenai/WildChat), and [LMSYS 1M](https://huggingface.co/datasets/lmsys/lmsys-chat-1m) by prompting [Tulu 2 70B](https://huggingface.co/allenai/tulu-2-70b) to identify math-related examples. Please read the appendix of [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://link.todo) for more details.
- ultrafeedback_code_math_prompts: Code prompts mined using the same method as the math prompts but for code combined with the UltraFeedback and math prompt sets. This is the 'mixed' prompt set used in [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://link.todo) when exploring the effect of prompts.
Further details:
- **Curated by:** @hamishivi
- **Language(s) (NLP):** English
- **License:** ODC-BY. Note GSM8k and UltraFeedback are licensed under MIT, LMSYS under a custom license, and WildChat under the Ai2 low-risk impact license.
## Uses
This dataset is intended for use in research when training models with online RLHF methods, where only unlabelled prompts are involved.
## Citation
If you find this data useful, please cite:
```bibtex
@misc{ivison2024unpacking,
title={{Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback}},
author={{Hamish Ivison and Yizhong Wang and Jiacheng Liu and Ellen Wu and Valentina Pyatkin and Nathan Lambert and Yejin Choi and Noah A. Smith and Hannaneh Hajishirzi}}
year={2024},
eprint={2406.09279},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
# Tulu 2.5 提示词数据集
本数据集包含用于训练[《Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback》](https://arxiv.org/abs/2406.09279)中所述PPO模型的提示词集合,仅包含PPO训练阶段使用的提示词。
## 数据集详情
每个提示词的说明如下:
- gsm8k_prompts:取自[GSM8K训练拆分集](https://huggingface.co/datasets/openai/gsm8k)的提示词。
- ultrafeedback_prompts:来自[清洗版UltraFeedback](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned)数据集的提示词。
- math_prompts:通过提示[Tulu 2 70B](https://huggingface.co/allenai/tulu-2-70b)识别数学相关示例,从[UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback)、[WildChat](https://huggingface.co/datasets/allenai/WildChat)和[LMSYS 1M](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)中挖掘得到的提示词。更多细节请参阅《Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback》的附录(https://link.todo)。
- ultrafeedback_code_math_prompts:采用与数学提示词相同的方法挖掘得到的代码提示词,结合了UltraFeedback与数学提示词集合,即《Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback》中用于探究提示词影响的「混合」提示词集。
## 数据集补充信息
- **数据整理者:** @hamishivi
- **自然语言处理所用语言:** 英语
- **许可证:** ODC-BY。需注意:GSM8K与UltraFeedback采用MIT许可证,LMSYS采用自定义许可证,WildChat采用Ai2低风险影响许可证。
## 使用场景
本数据集旨在用于仅涉及未标注提示词的在线强化学习从人类反馈中学习(Reinforcement Learning from Human Feedback,RLHF)方法训练模型的研究场景。
## 引用方式
若您使用本数据集,请引用如下文献:
bibtex
@misc{ivison2024unpacking,
title={{Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback}},
author={{Hamish Ivison and Yizhong Wang and Jiacheng Liu and Ellen Wu and Valentina Pyatkin and Nathan Lambert and Yejin Choi and Noah A. Smith and Hannaneh Hajishirzi}}
year={2024},
eprint={2406.09279},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
提供机构:
maas
创建时间:
2025-05-28



