tulu-3-wildchat-if-on-policy-8b

Name: tulu-3-wildchat-if-on-policy-8b
Creator: maas
Published: 2025-12-04 16:18:19
License: 暂无描述

魔搭社区2025-12-04 更新2024-11-30 收录

下载链接：

https://modelscope.cn/datasets/LLM-Research/tulu-3-wildchat-if-on-policy-8b

下载链接

链接失效反馈

官方服务：

资源简介：

<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/> # Llama 3.1 Tulu 3 Wildchat IF (on-policy 8b) *Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data. Some portions of the dataset are non-commercial. We present the mixture as a research artifact.* This preference dataset is part of our Tulu 3 preference mixture: it contains prompts from [WildChat](allenai/WildChat-1M), which include constraints, and it contains 10,792 generation pairs (some of which on-policy from allenai/Llama-3.1-Tulu-3-8B) obtained using the following models: - [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) (Apache 2.0) - [Mistral Nemo Instruct 2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) (Apache 2.0) - [Tulu 2 7B](https://huggingface.co/allenai/tulu-2-7b) (Ai2 ImpACT Low Risk License) - [Tulu 2 13B](https://huggingface.co/allenai/tulu-2-13b) (Ai2 ImpACT Low Risk License) - [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) (Apache 2.0) - [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat) (Apache 2.0) - [MPT 30B Chat](https://huggingface.co/mosaicml/mpt-30b-chat) (CC-BY-SA-4.0) - [MPT 7B 8k Chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat) (CC-BY-SA-4.0) - [Google Gemma 2 27B it](https://huggingface.co/google/gemma-2-27b-it) (Gemma is provided under and subject to the Gemma Terms of Use found at [ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms)) - [Google Gemma 2 9B it](https://huggingface.co/google/gemma-2-9b-it) (Gemma is provided under and subject to the Gemma Terms of Use found at [ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms)) - [InternLM2.5 20B](https://huggingface.co/internlm/internlm2_5-20b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.) - [InternLM2.5 7B](https://huggingface.co/internlm/internlm2_5-7b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.) - [InternLM2.5 1.8B](https://huggingface.co/internlm/internlm2_5-1_8b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.) - [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b-instruct) (Apache 2.0) - [Qwen2.5 72B Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) (Qwen is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.) - [Qwen2.5 32B Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) (Apache 2.0) - [Qwen2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) (Apache 2.0) - [Qwen2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) (Apache 2.0) - [Llama 3.1 8B Instruct ](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) (this dataset was partially "Built with Llama" and is thus subject to the Llama 3.1 License) - [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (this dataset was partially "Built with Llama" and is thus subject to the Llama 3.1 License) - [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B) (this dataset was partially "Built with Meta Llama 3" and is thus subject to the Llama 3 License) - [GPT-4 Turbo](https://openai.com/index/new-models-and-developer-products-announced-at-devday/) and [GPT-4o](https://openai.com/index/hello-gpt-4o/) (Outputs produced by GPT-4 are subject to OpenAI's [terms of use](https://openai.com/policies/row-terms-of-use)) - [Claude 3.5 Sonnet](https://www.anthropic.com/news/claude-3-5-sonnet) (Outputs produced by Claude are subject to Anthropic [terms of service](https://www.anthropic.com/legal/commercial-terms) and [usage policy](https://www.anthropic.com/legal/aup)) ## Completion Generation Approach: Given a set of prompts, we generated the completions and preferences using a synthetic pipeline that combines both on-policy and off-policy data, and obtained the preference annotations on four different aspects using the Ultrafeedback template and an LLM judge. The code for the synthetic generation pipeline is found in the scripts/synth_pref directory of [open-instruct](https://github.com/allenai/open-instruct/) ## License This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). This dataset includes output data generated from third party models that are subject to separate terms governing their use.

<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 横幅" width="400" style="margin-left:auto; margin-right:auto; display:block"/> # Llama 3.1 Tulu 3 Wildchat IF（同策略8B版本） *请注意，本数据集合集采用ODC-BY-1.0开源协议；数据子集可能适用不同许可协议，部分数据片段仅可用于非商业用途。本混合数据集仅作为研究成果发布。* 本偏好数据集隶属于Tulu 3偏好混合数据集系列：其提示词源自[WildChat](allenai/WildChat-1M)，包含各类约束条件，同时包含10792条生成样本对（其中部分同策略生成结果来自allenai/Llama-3.1-Tulu-3-8B），上述样本通过以下模型生成： - [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)（Apache 2.0协议） - [Mistral Nemo Instruct 2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)（Apache 2.0协议） - [Tulu 2 7B](https://huggingface.co/allenai/tulu-2-7b)（Ai2 ImpACT低风险许可协议） - [Tulu 2 13B](https://huggingface.co/allenai/tulu-2-13b)（Ai2 ImpACT低风险许可协议） - [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat)（Apache 2.0协议） - [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)（Apache 2.0协议） - [MPT 30B Chat](https://huggingface.co/mosaicml/mpt-30b-chat)（CC-BY-SA-4.0协议） - [MPT 7B 8k Chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat)（CC-BY-SA-4.0协议） - [Google Gemma 2 27B it](https://huggingface.co/google/gemma-2-27b-it)（Gemma模型需遵守[ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms)中的Gemma使用条款） - [Google Gemma 2 9B it](https://huggingface.co/google/gemma-2-9b-it)（Gemma模型需遵守[ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms)中的Gemma使用条款） - [InternLM2.5 20B](https://huggingface.co/internlm/internlm2_5-20b-chat)（InternLM模型权重可完全开放用于学术研究，同时支持免费商业使用；商业许可可参照模型卡片说明获取） - [InternLM2.5 7B](https://huggingface.co/internlm/internlm2_5-7b-chat)（InternLM模型权重可完全开放用于学术研究，同时支持免费商业使用；商业许可可参照模型卡片说明获取） - [InternLM2.5 1.8B](https://huggingface.co/internlm/internlm2_5-1_8b-chat)（InternLM模型权重可完全开放用于学术研究，同时支持免费商业使用；商业许可可参照模型卡片说明获取） - [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b-instruct)（Apache 2.0协议） - [Qwen2.5 72B Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)（Qwen模型采用Qwen许可协议，版权归阿里云所有，保留所有权利） - [Qwen2.5 32B Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)（Apache 2.0协议） - [Qwen2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)（Apache 2.0协议） - [Qwen2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)（Apache 2.0协议） - [Llama 3.1 8B Instruct ](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)（本数据集部分内容“基于Llama开发”，需遵守Llama 3.1许可协议） - [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct)（本数据集部分内容“基于Llama开发”，需遵守Llama 3.1许可协议） - [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B)（本数据集部分内容“基于Meta Llama 3开发”，需遵守Llama 3许可协议） - [GPT-4 Turbo](https://openai.com/index/new-models-and-developer-products-announced-at-devday/)与[GPT-4o](https://openai.com/index/hello-gpt-4o/)（GPT-4生成的输出需遵守OpenAI[使用条款](https://openai.com/policies/row-terms-of-use)） - [Claude 3.5 Sonnet](https://www.anthropic.com/news/claude-3-5-sonnet)（Claude生成的输出需遵守Anthropic[服务条款](https://www.anthropic.com/legal/commercial-terms)与[使用政策](https://www.anthropic.com/legal/aup)） ## 样本生成方法：针对给定的提示词集合，我们通过结合同策略与异策略数据的合成流水线生成回复与偏好标注，并借助Ultrafeedback模板与大语言模型（Large Language Model，LLM）评审器从四个维度完成偏好标注。合成生成流水线的代码可在[open-instruct](https://github.com/allenai/open-instruct/)的scripts/synth_pref目录中获取。 ## 许可协议：本数据集采用ODC-BY协议发布，仅可用于研究与教育用途，需遵循艾伦人工智能研究所（Allen Institute for AI，简称Ai2）的[负责任使用指南](https://allenai.org/responsible-use)。本数据集包含第三方模型生成的输出数据，此类数据需遵守对应模型的单独使用条款。

提供机构：

maas

创建时间：

2024-11-23

搜集汇总

数据集介绍