olmo-2-1124-7b-preference-mix
收藏魔搭社区2025-12-05 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/olmo-2-1124-7b-preference-mix
下载链接
链接失效反馈官方服务:
资源简介:
# OLMo 2 1124 7B Preference Mixture
*Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data. Some portions of the dataset are non-commercial. We present the mixture as a research artifact.*
This mix is made up of the following on-policy preference datasets generated using a synthetic data generation pipeline similar to Tulu 3:
- Reused prompts from the SFT mix (via ai2-adapt-dev/sft_v3.9_used_on_policy_po_olmo2_7b and ai2-adapt-dev/sft_v3.9_used_on_policy_p1_olmo2_7b)
- Reused prompts from the SFT mix filtered for instruction-following (via ai2-adapt-dev/sft_v3.9_if_taxonomy_olmo2_7b)
- Reused prompts in SFT subsampled from WildChat (via ai2-adapt-dev/wildchat_v3.9_used_on_policy_olmo2_7b)
- Cleaned version of Ultrafeedback without ShareGPT and TruthfulQA instances (via ai2-adapt-dev/ultrafeedback_cleaned_olmo2_7b)
- Prompts from WildChat that wasn't used in the SFT mix (via ai2-adapt-dev/wildchat_v3.9_unused_on_policy_olmo2_7b)
- Prompts from DaringAnteater (via ai2-adapt-dev/DaringAnteater-prefs_olmo2_7b)
This preference mixture used for DPO on our the [OLMo-2-1124-7B-SFT](https://huggingface.co/allenai/OLMo-2-1124-7B-SFT) checkpoint to obtain [OLMo-2-1124-7B-DPO O](https://huggingface.co/allenai/OLMo-2-1124-7B-DPO).
It contains 366.7k generation pairs obtained using the following models:
- [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) (Apache 2.0)
- [Mistral Nemo Instruct 2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) (Apache 2.0)
- [Tulu 2 7B](https://huggingface.co/allenai/tulu-2-7b) (Ai2 ImpACT Low Risk License)
- [Tulu 2 13B](https://huggingface.co/allenai/tulu-2-13b) (Ai2 ImpACT Low Risk License)
- [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) (Apache 2.0)
- [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat) (Apache 2.0)
- [MPT 30B Chat](https://huggingface.co/mosaicml/mpt-30b-chat) (CC-BY-SA-4.0)
- [MPT 7B 8k Chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat) (CC-BY-SA-4.0)
- [Google Gemma 2 27B it](https://huggingface.co/google/gemma-2-27b-it) (Gemma is provided under and subject to the Gemma Terms of Use found at [ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms))
- [Google Gemma 2 9B it](https://huggingface.co/google/gemma-2-9b-it) (Gemma is provided under and subject to the Gemma Terms of Use found at [ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms))
- [InternLM2.5 20B](https://huggingface.co/internlm/internlm2_5-20b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.)
- [InternLM2.5 7B](https://huggingface.co/internlm/internlm2_5-7b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.)
- [InternLM2.5 1.8B](https://huggingface.co/internlm/internlm2_5-1_8b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.)
- [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b-instruct) (Apache 2.0)
- [Qwen2.5 32B Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) (Apache 2.0)
- [Qwen2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) (Apache 2.0)
- [Qwen2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) (Apache 2.0)
- [GPT-4 Turbo](https://openai.com/index/new-models-and-developer-products-announced-at-devday/) and [GPT-4o](https://openai.com/index/hello-gpt-4o/) (Outputs produced by GPT-4 are subject to OpenAI's [terms of use](https://openai.com/policies/row-terms-of-use))
- [Microsoft Phi 3 Mini 128k Instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) (MIT)
- [Microsoft Phi 3.5 Mini Instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) (MIT)
- [NuMind NuExtract v1.5](https://huggingface.co/numind/NuExtract-1.5) (MIT)
## License
This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). This dataset includes output data generated from third party models that are subject to separate terms governing their use.
# OLMo 2 1124 7B 偏好混合数据集(OLMo 2 1124 7B Preference Mixture)
*请注意,本数据集合集采用ODC-BY-1.0许可证授权;数据子集适用不同的许可证条款。本数据集部分内容为非商业用途。本混合数据集仅作为研究成果发布。*
本混合数据集由以下基于类似Tulu 3的合成数据生成流水线构建的在线策略(on-policy)偏好数据集组成:
- 从监督微调(Supervised Fine-Tuning, SFT)混合数据集中复用的提示词(通过`ai2-adapt-dev/sft_v3.9_used_on_policy_po_olmo2_7b`与`ai2-adapt-dev/sft_v3.9_used_on_policy_p1_olmo2_7b`获取)
- 从经过指令遵循筛选的SFT混合数据集中复用的提示词(通过`ai2-adapt-dev/sft_v3.9_if_taxonomy_olmo2_7b`获取)
- 从WildChat中采样得到的SFT数据内复用的提示词(通过`ai2-adapt-dev/wildchat_v3.9_used_on_policy_olmo2_7b`获取)
- 移除了ShareGPT与TruthfulQA样本的清洗版Ultrafeedback数据集(通过`ai2-adapt-dev/ultrafeedback_cleaned_olmo2_7b`获取)
- 未在SFT混合数据集中使用的WildChat提示词(通过`ai2-adapt-dev/wildchat_v3.9_unused_on_policy_olmo2_7b`获取)
- 源自DaringAnteater的提示词(通过`ai2-adapt-dev/DaringAnteater-prefs_olmo2_7b`获取)
本偏好混合数据集用于基于我们的[OLMo-2-1124-7B-SFT](https://huggingface.co/allenai/OLMo-2-1124-7B-SFT)检查点的直接偏好优化(Direct Preference Optimization, DPO)训练,以生成[OLMo-2-1124-7B-DPO O](https://huggingface.co/allenai/OLMo-2-1124-7B-DPO)模型。
本数据集包含36.67万组生成样本对,其生成基于以下模型:
- [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)(采用Apache 2.0许可证)
- [Mistral Nemo Instruct 2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)(采用Apache 2.0许可证)
- [Tulu 2 7B](https://huggingface.co/allenai/tulu-2-7b)(采用Ai2 ImpACT低风险许可证)
- [Tulu 2 13B](https://huggingface.co/allenai/tulu-2-13b)(采用Ai2 ImpACT低风险许可证)
- [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat)(采用Apache 2.0许可证)
- [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)(采用Apache 2.0许可证)
- [MPT 30B Chat](https://huggingface.co/mosaicml/mpt-30b-chat)(采用CC-BY-SA-4.0许可证)
- [MPT 7B 8k Chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat)(采用CC-BY-SA-4.0许可证)
- [Google Gemma 2 27B it](https://huggingface.co/google/gemma-2-27b-it)(Gemma模型采用[ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms)公布的Gemma使用条款)
- [Google Gemma 2 9B it](https://huggingface.co/google/gemma-2-9b-it)(Gemma模型采用[ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms)公布的Gemma使用条款)
- [InternLM2.5 20B](https://huggingface.co/internlm/internlm2_5-20b-chat)(InternLM权重对学术研究完全开放,同时支持免费商业使用;商业许可证可参照模型卡说明获取)
- [InternLM2.5 7B](https://huggingface.co/internlm/internlm2_5-7b-chat)(InternLM权重对学术研究完全开放,同时支持免费商业使用;商业许可证可参照模型卡说明获取)
- [InternLM2.5 1.8B](https://huggingface.co/internlm/internlm2_5-1_8b-chat)(InternLM权重对学术研究完全开放,同时支持免费商业使用;商业许可证可参照模型卡说明获取)
- [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b-instruct)(采用Apache 2.0许可证)
- [Qwen2.5 32B Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)(采用Apache 2.0许可证)
- [Qwen2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)(采用Apache 2.0许可证)
- [Qwen2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)(采用Apache 2.0许可证)
- [GPT-4 Turbo](https://openai.com/index/new-models-and-developer-products-announced-at-devday/)与[GPT-4o](https://openai.com/index/hello-gpt-4o/)(GPT-4生成的输出受OpenAI[使用条款](https://openai.com/policies/row-terms-of-use)约束)
- [Microsoft Phi 3 Mini 128k Instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)(采用MIT许可证)
- [Microsoft Phi 3.5 Mini Instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)(采用MIT许可证)
- [NuMind NuExtract v1.5](https://huggingface.co/numind/NuExtract-1.5)(采用MIT许可证)
## 许可证
本数据集采用ODC-BY许可证授权。本数据集仅用于符合Ai2[负责任使用指南](https://allenai.org/responsible-use)要求的研究与教育用途。本数据集包含由第三方模型生成的输出数据,此类数据受其各自使用条款约束。
提供机构:
maas
创建时间:
2025-05-27



