olmo-2-1124-13b-preference-mix

Name: olmo-2-1124-13b-preference-mix
Creator: maas
Published: 2025-12-05 16:36:14
License: 暂无描述

魔搭社区2025-12-05 更新2025-05-31 收录

下载链接：

https://modelscope.cn/datasets/allenai/olmo-2-1124-13b-preference-mix

下载链接

链接失效反馈

官方服务：

资源简介：

# OLMo 2 1124 13B Preference Mixture *Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data. Some portions of the dataset are non-commercial. We present the mixture as a research artifact.* This mix is made up of the following on-policy preference datasets generated using a synthetic data generation pipeline similar to Tulu - Reused prompts from the SFT mix (via ai2-adapt-dev/sft_v3.9_used_on_policy_po_olmo2_13b and ai2-adapt-dev/sft_v3.9_used_on_policy_p1_olmo2_13b) - Reused prompts from the SFT mix filtered for instruction-following (via ai2-adapt-dev/sft_v3.9_if_taxonomy_olmo2_13b) - Reused prompts in SFT subsampled from WildChat (via ai2-adapt-dev/wildchat_v3.9_used_on_policy_olmo2_13b) - Cleaned version of Ultrafeedback without ShareGPT and TruthfulQA instances (via ai2-adapt-dev/ultrafeedback_cleaned_olmo2_13b and ai2-adapt-dev/WildChat-prefs-280824_olmo2_13b) - Prompts from WildChat that wasn't used in the SFT mix (via ai2-adapt-dev/wildchat_v3.9_unused_on_policy_olmo2_13b) - Prompts from DaringAnteater (via ai2-adapt-dev/DaringAnteater-prefs_olmo2_13b) This preference mixture used for DPO on our the [OLMo-2-1124-13B-SFT](https://huggingface.co/allenai/OLMo-2-1124-13B-SFT) checkpoint to obtain [OLMo-2-1124-13B-DPO O](https://huggingface.co/allenai/OLMo-2-1124-13B-DPO). It contains 377.7k generation pairs obtained using the following models: - [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) (Apache 2.0) - [Mistral Nemo Instruct 2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) (Apache 2.0) - [Tulu 2 7B](https://huggingface.co/allenai/tulu-2-7b) (Ai2 ImpACT Low Risk License) - [Tulu 2 13B](https://huggingface.co/allenai/tulu-2-13b) (Ai2 ImpACT Low Risk License) - [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) (Apache 2.0) - [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat) (Apache 2.0) - [MPT 30B Chat](https://huggingface.co/mosaicml/mpt-30b-chat) (CC-BY-SA-4.0) - [MPT 7B 8k Chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat) (CC-BY-SA-4.0) - [Google Gemma 2 27B it](https://huggingface.co/google/gemma-2-27b-it) (Gemma is provided under and subject to the Gemma Terms of Use found at [ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms)) - [Google Gemma 2 9B it](https://huggingface.co/google/gemma-2-9b-it) (Gemma is provided under and subject to the Gemma Terms of Use found at [ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms)) - [InternLM2.5 20B](https://huggingface.co/internlm/internlm2_5-20b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.) - [InternLM2.5 7B](https://huggingface.co/internlm/internlm2_5-7b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.) - [InternLM2.5 1.8B](https://huggingface.co/internlm/internlm2_5-1_8b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.) - [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b-instruct) (Apache 2.0) - [Qwen2.5 32B Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) (Apache 2.0) - [Qwen2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) (Apache 2.0) - [Qwen2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) (Apache 2.0) - [GPT-4 Turbo](https://openai.com/index/new-models-and-developer-products-announced-at-devday/) and [GPT-4o](https://openai.com/index/hello-gpt-4o/) (Outputs produced by GPT-4 are subject to OpenAI's [terms of use](https://openai.com/policies/row-terms-of-use)) - [Microsoft Phi 3 Mini 128k Instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) (MIT) - [Microsoft Phi 3.5 Mini Instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) (MIT) - [NuMind NuExtract v1.5](https://huggingface.co/numind/NuExtract-1.5) (MIT) ## License This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). This dataset includes output data generated from third party models that are subject to separate terms governing their use.

# OLMo 2 1124 13B 偏好混合数据集请注意，本数据集合集采用ODC-BY-1.0许可证授权；数据子集适用不同的许可协议。本数据集的部分内容为非商用性质，本混合数据集仅作为研究成果发布。本混合数据集由以下采用与Tulu相似的合成数据生成流水线构建的在线策略（on-policy）偏好数据集构成： - 源自监督微调（Supervised Fine-Tuning, SFT）混合数据集的复用提示词（通过 ai2-adapt-dev/sft_v3.9_used_on_policy_po_olmo2_13b 与 ai2-adapt-dev/sft_v3.9_used_on_policy_p1_olmo2_13b 生成） - 经指令遵循性筛选后的监督微调混合数据集复用提示词（通过 ai2-adapt-dev/sft_v3.9_if_taxonomy_olmo2_13b 生成） - 从WildChat中采样得到的监督微调数据集复用提示词（通过 ai2-adapt-dev/wildchat_v3.9_used_on_policy_olmo2_13b 生成） - 移除了ShareGPT与TruthfulQA样本的Ultrafeedback清理版本（通过 ai2-adapt-dev/ultrafeedback_cleaned_olmo2_13b 与 ai2-adapt-dev/WildChat-prefs-280824_olmo2_13b 生成） - 未在监督微调混合数据集中使用的WildChat提示词（通过 ai2-adapt-dev/wildchat_v3.9_unused_on_policy_olmo2_13b 生成） - 源自DaringAnteater的提示词（通过 ai2-adapt-dev/DaringAnteater-prefs_olmo2_13b 生成）本偏好混合数据集用于对我们的[OLMo-2-1124-13B-SFT](https://huggingface.co/allenai/OLMo-2-1124-13B-SFT)模型检查点进行直接偏好优化（Direct Preference Optimization, DPO）训练，以得到[OLMo-2-1124-13B-DPO](https://huggingface.co/allenai/OLMo-2-1124-13B-DPO)。本数据集包含377.7k条生成样本对，其生成所使用的模型如下： - [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)（Apache 2.0许可证） - [Mistral Nemo Instruct 2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)（Apache 2.0许可证） - [Tulu 2 7B](https://huggingface.co/allenai/tulu-2-7b)（Ai2 ImpACT 低风险许可证） - [Tulu 2 13B](https://huggingface.co/allenai/tulu-2-13b)（Ai2 ImpACT 低风险许可证） - [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat)（Apache 2.0许可证） - [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)（Apache 2.0许可证） - [MPT 30B Chat](https://huggingface.co/mosaicml/mpt-30b-chat)（CC-BY-SA-4.0许可证） - [MPT 7B 8k Chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat)（CC-BY-SA-4.0许可证） - [Google Gemma 2 27B it](https://huggingface.co/google/gemma-2-27b-it)（Gemma模型需遵守[ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms)中的Gemma使用条款） - [Google Gemma 2 9B it](https://huggingface.co/google/gemma-2-9b-it)（Gemma模型需遵守[ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms)中的Gemma使用条款） - [InternLM2.5 20B](https://huggingface.co/internlm/internlm2_5-20b-chat)（InternLM权重完全开放用于学术研究，同时支持免费商用；如需商用许可证，请参照模型卡片说明获取） - [InternLM2.5 7B](https://huggingface.co/internlm/internlm2_5-7b-chat)（InternLM权重完全开放用于学术研究，同时支持免费商用；如需商用许可证，请参照模型卡片说明获取） - [InternLM2.5 1.8B](https://huggingface.co/internlm/internlm2_5-1_8b-chat)（InternLM权重完全开放用于学术研究，同时支持免费商用；如需商用许可证，请参照模型卡片说明获取） - [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b-instruct)（Apache 2.0许可证） - [Qwen2.5 32B Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)（Apache 2.0许可证） - [Qwen2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)（Apache 2.0许可证） - [Qwen2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)（Apache 2.0许可证） - [GPT-4 Turbo](https://openai.com/index/new-models-and-developer-products-announced-at-devday/)与[GPT-4o](https://openai.com/index/hello-gpt-4o/)（GPT-4生成的输出需遵守OpenAI的[使用条款](https://openai.com/policies/row-terms-of-use)） - [Microsoft Phi 3 Mini 128k Instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)（MIT许可证） - [Microsoft Phi 3.5 Mini Instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)（MIT许可证） - [NuMind NuExtract v1.5](https://huggingface.co/numind/NuExtract-1.5)（MIT许可证） ## 许可证本数据集采用ODC-BY许可证授权，仅可用于研究与教育用途，并需遵循艾伦人工智能研究所（Allen Institute for AI, Ai2）的[负责任使用指南](https://allenai.org/responsible-use)。本数据集包含由第三方模型生成的输出数据，此类数据需遵守对应模型自身的使用条款。

提供机构：

maas

创建时间：

2025-05-27

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是OLMo 2 1124 13B偏好混合，由AllenAI发布，采用ODC-BY-1.0许可，主要用于研究目的。它整合了来自SFT混合、WildChat和Ultrafeedback等多个来源的提示数据，包含377.7k生成对，用于对OLMo-2-1124-13B-SFT检查点进行直接偏好优化训练，生成模型涵盖Mistral、Tulu、GPT-4等多样化模型。

以上内容由遇见数据集搜集并总结生成