tulu-3-ultrafeedback-cleaned-on-policy-8b
收藏魔搭社区2025-12-26 更新2024-11-30 收录
下载链接:
https://modelscope.cn/datasets/LLM-Research/tulu-3-ultrafeedback-cleaned-on-policy-8b
下载链接
链接失效反馈官方服务:
资源简介:
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
# Llama 3.1 Tulu 3 Ultrafeedback (Cleaned) (on-policy 8B)
*Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data. Some portions of the dataset are non-commercial. We present the mixture as a research artifact.*
This preference dataset is part of our Tulu 3 preference mixture.
It contains prompts from [Ai2's cleaned version of Ultrafeedback](allenai/ultrafeedback_binarized_cleaned) which removes instances of TruthfulQA.
We further filtered this dataset to remove instances from ShareGPT.
It contains 41.6k generation pairs (some of which on-policy completions from https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) obtained using the following models:
- [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) (Apache 2.0)
- [Mistral Nemo Instruct 2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) (Apache 2.0)
- [Tulu 2 7B](https://huggingface.co/allenai/tulu-2-7b) (Ai2 ImpACT Low Risk License)
- [Tulu 2 13B](https://huggingface.co/allenai/tulu-2-13b) (Ai2 ImpACT Low Risk License)
- [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) (Apache 2.0)
- [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat) (Apache 2.0)
- [MPT 30B Chat](https://huggingface.co/mosaicml/mpt-30b-chat) (CC-BY-SA-4.0)
- [MPT 7B 8k Chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat) (CC-BY-SA-4.0)
- [Google Gemma 2 27B it](https://huggingface.co/google/gemma-2-27b-it) (Gemma is provided under and subject to the Gemma Terms of Use found at [ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms))
- [Google Gemma 2 9B it](https://huggingface.co/google/gemma-2-9b-it) (Gemma is provided under and subject to the Gemma Terms of Use found at [ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms))
- [InternLM2.5 20B](https://huggingface.co/internlm/internlm2_5-20b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.)
- [InternLM2.5 7B](https://huggingface.co/internlm/internlm2_5-7b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.)
- [InternLM2.5 1.8B](https://huggingface.co/internlm/internlm2_5-1_8b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.)
- [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b-instruct) (Apache 2.0)
- [Qwen2.5 72B Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) (Qwen is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.)
- [Qwen2.5 32B Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) (Apache 2.0)
- [Qwen2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) (Apache 2.0)
- [Qwen2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) (Apache 2.0)
- [Llama 3.1 8B Instruct ](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) (this dataset was partially "Built with Llama" and is thus subject to the Llama 3.1 License)
- [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (this dataset was partially "Built with Llama" and is thus subject to the Llama 3.1 License)
- [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B) (this dataset was partially "Built with Meta Llama 3" and is thus subject to the Llama 3 License)
- [GPT-4 Turbo](https://openai.com/index/new-models-and-developer-products-announced-at-devday/) and [GPT-4o](https://openai.com/index/hello-gpt-4o/) (Outputs produced by GPT-4 are subject to OpenAI's [terms of use](https://openai.com/policies/row-terms-of-use))
- [Claude 3.5 Sonnet](https://www.anthropic.com/news/claude-3-5-sonnet) (Outputs produced by Claude are subject to Anthropic [terms of service](https://www.anthropic.com/legal/commercial-terms) and [usage policy](https://www.anthropic.com/legal/aup))
## Completion Generation Approach:
Given a set of prompts, we generated the completions and preferences using a synthetic pipeline that combines both on-policy and off-policy data, and obtained the preference annotations on four different aspects using the Ultrafeedback template and an LLM judge. The code for the synthetic generation pipeline is found in the scripts/synth_pref directory of [open-instruct](https://github.com/allenai/open-instruct/)
## License
This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). This dataset includes output data generated from third party models that are subject to separate terms governing their use.
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
# Llama 3.1 Tulu 3 超反馈(Ultrafeedback,清洗版)(策略内(on-policy) 8B)
*请注意,本数据集集合采用 ODC-BY-1.0 许可证授权;数据子集可能适用不同的许可证。本数据集的部分内容仅允许非商业使用。我们将该混合数据集作为研究成果发布。*
本偏好数据集属于我们的 Tulu 3 偏好混合数据集的一部分。
本数据集包含来自 [Ai2 清洗版 超反馈(Ultrafeedback)数据集](allenai/ultrafeedback_binarized_cleaned) 的提示词,该版本已移除了 真实问答(TruthfulQA)相关样本。
我们进一步对该数据集进行了过滤,移除了来自 ShareGPT 的样本。
本数据集包含 41.6k 条生成样本对(其中部分为来自 https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B 的策略内(on-policy)生成结果),这些样本由以下模型生成:
- [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)(采用 Apache 2.0 许可证)
- [Mistral Nemo Instruct 2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)(采用 Apache 2.0 许可证)
- [Tulu 2 7B](https://huggingface.co/allenai/tulu-2-7b)(采用 Ai2 ImpACT 低风险许可证)
- [Tulu 2 13B](https://huggingface.co/allenai/tulu-2-13b)(采用 Ai2 ImpACT 低风险许可证)
- [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat)(采用 Apache 2.0 许可证)
- [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)(采用 Apache 2.0 许可证)
- [MPT 30B Chat](https://huggingface.co/mosaicml/mpt-30b-chat)(采用 CC-BY-SA-4.0 许可证)
- [MPT 7B 8k Chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat)(采用 CC-BY-SA-4.0 许可证)
- [Google Gemma 2 27B it](https://huggingface.co/google/gemma-2-27b-it)(Gemma 模型依据 [ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms) 中的 Gemma 使用条款提供)
- [Google Gemma 2 9B it](https://huggingface.co/google/gemma-2-9b-it)(Gemma 模型依据 [ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms) 中的 Gemma 使用条款提供)
- [InternLM2.5 20B](https://huggingface.co/internlm/internlm2_5-20b-chat)(InternLM 模型权重完全开放用于学术研究,同时支持免费商业使用;如需获取商业许可证,请参照模型卡片中的说明操作)
- [InternLM2.5 7B](https://huggingface.co/internlm/internlm2_5-7b-chat)(InternLM 模型权重完全开放用于学术研究,同时支持免费商业使用;如需获取商业许可证,请参照模型卡片中的说明操作)
- [InternLM2.5 1.8B](https://huggingface.co/internlm/internlm2_5-1_8b-chat)(InternLM 模型权重完全开放用于学术研究,同时支持免费商业使用;如需获取商业许可证,请参照模型卡片中的说明操作)
- [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b-instruct)(采用 Apache 2.0 许可证)
- [Qwen2.5 72B Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)(Qwen 模型采用 Qwen 许可证协议授权,版权所有 © 阿里巴巴云,保留所有权利)
- [Qwen2.5 32B Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)(采用 Apache 2.0 许可证)
- [Qwen2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)(采用 Apache 2.0 许可证)
- [Qwen2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)(采用 Apache 2.0 许可证)
- [Llama 3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)(本数据集部分内容“基于 Llama 构建”,因此需遵循 Llama 3.1 许可证)
- [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct)(本数据集部分内容“基于 Llama 构建”,因此需遵循 Llama 3.1 许可证)
- [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B)(本数据集部分内容“基于 Meta Llama 3 构建”,因此需遵循 Llama 3 许可证)
- [GPT-4 Turbo](https://openai.com/index/new-models-and-developer-products-announced-at-devday/) 和 [GPT-4o](https://openai.com/index/hello-gpt-4o/)(GPT-4 生成的输出需遵循 OpenAI 的 [使用条款](https://openai.com/policies/row-terms-of-use))
- [Claude 3.5 Sonnet](https://www.anthropic.com/news/claude-3-5-sonnet)(Claude 生成的输出需遵循 Anthropic 的 [服务条款](https://www.anthropic.com/legal/commercial-terms) 与 [使用政策](https://www.anthropic.com/legal/aup))
## 样本生成方法:
针对给定的提示词集合,我们采用结合策略内(on-policy)与策略外(off-policy)数据的合成生成流水线生成了样本与偏好标注,并使用 超反馈(Ultrafeedback)模板与大语言模型(LLM)评判器从四个不同维度获取了偏好标注。合成生成流水线的代码可在 [open-instruct](https://github.com/allenai/open-instruct/) 的 scripts/synth_pref 目录中获取。
## 许可证
本数据集采用 ODC-BY 许可证授权。根据 Ai2 的 [负责任使用指南](https://allenai.org/responsible-use),本数据集仅用于研究与教育用途。本数据集包含由第三方模型生成的输出数据,这些数据的使用需遵循对应模型的单独使用条款。
提供机构:
maas
创建时间:
2024-11-23
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是Tulu 3偏好混合的一部分,基于Ai2清理的Ultrafeedback版本构建,移除了TruthfulQA和ShareGPT实例,包含41.6k生成对,部分使用Llama-3.1-Tulu-3-8B模型生成。它整合了多种模型的输出,如Mistral、Yi和MPT等,整体采用ODC-BY-1.0许可证,但部分子集有非商业使用限制,适用于研究目的。
以上内容由遇见数据集搜集并总结生成



