five

olmoe-0125-1b-7b-preference-mix

收藏
魔搭社区2026-01-06 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/olmoe-0125-1b-7b-preference-mix
下载链接
链接失效反馈
官方服务:
资源简介:
<img alt="OLMo Logo" src="https://huggingface.co/allenai/OLMoE-1B-7B-0125/resolve/main/olmoe-logo.png" width="242px"> # OLMoE-1B-7B-0125-Instruct *Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data. Some portions of the dataset are non-commercial. We present the mixture as a research artifact.* This mix is made up of the following on-policy preference datasets generated using a synthetic data generation pipeline similar to Tulu 3: - Reused prompts from the SFT mix (ai2-adapt-dev/sft_v3.9_used_on_policy_p0_olmoe_1b-7b and ai2-adapt-dev/sft_v3.9_used_on_policy_p1_olmoe_1b-7b) - Reused prompts from the SFT mix filtered for instruction-following (ai2-adapt-dev/sft_v3.9_if_taxonomy_olmoe_1b-7b) - Reused prompts in SFT subsampled from WildChat (ai2-adapt-dev/wildchat_v3.9_used_on_policy_olmoe_1b-7b and ai2-adapt-dev/WildChat-prefs-280824_olmoe_1b-7b) - Cleaned version of Ultrafeedback without ShareGPT and TruthfulQA instances (ai2-adapt-dev/ultrafeedback_cleaned_olmoe_1b-7b) - Prompts from WildChat that wasn't used in the SFT mix (ai2-adapt-dev/wildchat_v3.9_unused_on_policy_olmoe_1b-7b) - Prompts from DaringAnteater (ai2-adapt-dev/DaringAnteater-prefs_olmoe_1b-7b) - Persona prompts with instruction following (allenai/tulu-3-pref-personas-instruction-following) This preference mixture used for DPO on our the [OLMoE-1B-7B-0125-SFT](https://huggingface.co/allenai/OLMoE-1B-7B-0125-SFT) checkpoint to obtain [OLMoE-1B-7B-0125-DPO](https://huggingface.co/allenai/OLMoE-1B-7B-0125-DPO). It contains 366.7k generation pairs obtained using the following models: - [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) (Apache 2.0) - [Mistral Nemo Instruct 2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) (Apache 2.0) - [Tulu 2 7B](https://huggingface.co/allenai/tulu-2-7b) (Ai2 ImpACT Low Risk License) - [Tulu 2 13B](https://huggingface.co/allenai/tulu-2-13b) (Ai2 ImpACT Low Risk License) - [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) (Apache 2.0) - [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat) (Apache 2.0) - [MPT 30B Chat](https://huggingface.co/mosaicml/mpt-30b-chat) (CC-BY-SA-4.0) - [MPT 7B 8k Chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat) (CC-BY-SA-4.0) - [Google Gemma 2 27B it](https://huggingface.co/google/gemma-2-27b-it) (Gemma is provided under and subject to the Gemma Terms of Use found at [ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms)) - [Google Gemma 2 9B it](https://huggingface.co/google/gemma-2-9b-it) (Gemma is provided under and subject to the Gemma Terms of Use found at [ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms)) - [InternLM2.5 20B](https://huggingface.co/internlm/internlm2_5-20b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.) - [InternLM2.5 7B](https://huggingface.co/internlm/internlm2_5-7b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.) - [InternLM2.5 1.8B](https://huggingface.co/internlm/internlm2_5-1_8b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.) - [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b-instruct) (Apache 2.0) - [Qwen2.5 32B Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) (Apache 2.0) - [Qwen2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) (Apache 2.0) - [Qwen2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) (Apache 2.0) - [GPT-4 Turbo](https://openai.com/index/new-models-and-developer-products-announced-at-devday/) and [GPT-4o](https://openai.com/index/hello-gpt-4o/) (Outputs produced by GPT-4 are subject to OpenAI's [terms of use](https://openai.com/policies/row-terms-of-use)) - [Microsoft Phi 3 Mini 128k Instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) (MIT) - [Microsoft Phi 3.5 Mini Instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) (MIT) - [NuMind NuExtract v1.5](https://huggingface.co/numind/NuExtract-1.5) (MIT) ## License This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). This dataset includes output data generated from third party models that are subject to separate terms governing their use.

【配图:OLMo标志,图片链接:https://huggingface.co/allenai/OLMoE-1B-7B-0125/resolve/main/olmoe-logo.png,宽度242像素】 # OLMoE-1B-7B-0125-Instruct 数据集 “注:本数据集集合采用ODC-BY-1.0协议授权,数据子集可适用不同许可协议;数据集部分内容仅供非商业使用。本混合数据集仅作为研究成果发布。” 本混合数据集由以下基于类似Tulu 3的合成数据生成流程构建的在线策略偏好数据集组成: - 来自监督微调(Supervised Fine-Tuning, SFT)混合集的复用提示词(对应数据集仓库:ai2-adapt-dev/sft_v3.9_used_on_policy_p0_olmoe_1b-7b 与 ai2-adapt-dev/sft_v3.9_used_on_policy_p1_olmoe_1b-7b) - 经指令遵循任务筛选后的监督微调混合集复用提示词(对应数据集仓库:ai2-adapt-dev/sft_v3.9_if_taxonomy_olmoe_1b-7b) - 从WildChat数据集中采样得到的监督微调混合集复用提示词(对应数据集仓库:ai2-adapt-dev/wildchat_v3.9_used_on_policy_olmoe_1b-7b 与 ai2-adapt-dev/WildChat-prefs-280824_olmoe_1b-7b) - 移除ShareGPT与TruthfulQA样本后的Cleaned Ultrafeedback数据集(对应数据集仓库:ai2-adapt-dev/ultrafeedback_cleaned_olmoe_1b-7b) - 未纳入监督微调混合集的WildChat提示词(对应数据集仓库:ai2-adapt-dev/wildchat_v3.9_unused_on_policy_olmoe_1b-7b) - 来自DaringAnteater的提示词(对应数据集仓库:ai2-adapt-dev/DaringAnteater-prefs_olmoe_1b-7b) - 带指令遵循任务的角色人设提示词(对应数据集仓库:allenai/tulu-3-pref-personas-instruction-following) 本偏好数据集混合集将用于在[OLMoE-1B-7B-0125-SFT](https://huggingface.co/allenai/OLMoE-1B-7B-0125-SFT)模型检查点上执行直接偏好优化(Direct Preference Optimization, DPO)训练,以得到[OLMoE-1B-7B-0125-DPO](https://huggingface.co/allenai/OLMoE-1B-7B-0125-DPO)模型。 本数据集共包含36.67万个生成样本对,其生成所用模型如下: - [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)(采用Apache 2.0协议授权) - [Mistral Nemo Instruct 2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)(采用Apache 2.0协议授权) - [Tulu 2 7B](https://huggingface.co/allenai/tulu-2-7b)(采用Ai2 ImpACT低风险许可协议) - [Tulu 2 13B](https://huggingface.co/allenai/tulu-2-13b)(采用Ai2 ImpACT低风险许可协议) - [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat)(采用Apache 2.0协议授权) - [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)(采用Apache 2.0协议授权) - [MPT 30B Chat](https://huggingface.co/mosaicml/mpt-30b-chat)(采用知识共享署名-相同方式共享4.0国际协议(CC-BY-SA-4.0)) - [MPT 7B 8k Chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat)(采用知识共享署名-相同方式共享4.0国际协议(CC-BY-SA-4.0)) - [Google Gemma 2 27B it](https://huggingface.co/google/gemma-2-27b-it)(Gemma模型需遵守[Gemma使用条款](https://ai.google.dev/gemma/terms)) - [Google Gemma 2 9B it](https://huggingface.co/google/gemma-2-9b-it)(Gemma模型需遵守[Gemma使用条款](https://ai.google.dev/gemma/terms)) - [InternLM2.5 20B](https://huggingface.co/internlm/internlm2_5-20b-chat)(InternLM模型权重完全开放用于学术研究,同时支持免费商业使用;如需商业许可,请参照模型卡片说明获取) - [InternLM2.5 7B](https://huggingface.co/internlm/internlm2_5-7b-chat)(InternLM模型权重完全开放用于学术研究,同时支持免费商业使用;如需商业许可,请参照模型卡片说明获取) - [InternLM2.5 1.8B](https://huggingface.co/internlm/internlm2_5-1_8b-chat)(InternLM模型权重完全开放用于学术研究,同时支持免费商业使用;如需商业许可,请参照模型卡片说明获取) - [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b-instruct)(采用Apache 2.0协议授权) - [Qwen2.5 32B Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)(采用Apache 2.0协议授权) - [Qwen2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)(采用Apache 2.0协议授权) - [Qwen2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)(采用Apache 2.0协议授权) - [GPT-4 Turbo](https://openai.com/index/new-models-and-developer-products-announced-at-devday/) 与 [GPT-4o](https://openai.com/index/hello-gpt-4o/)(由GPT-4系列模型生成的输出需遵守OpenAI的[使用条款](https://openai.com/policies/row-terms-of-use)) - [Microsoft Phi 3 Mini 128k Instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)(采用MIT协议授权) - [Microsoft Phi 3.5 Mini Instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)(采用MIT协议授权) - [NuMind NuExtract v1.5](https://huggingface.co/numind/NuExtract-1.5)(采用MIT协议授权) ## 许可协议 本数据集采用ODC-BY协议授权,旨在用于研究与教育用途,需遵循艾伦人工智能研究所(Allen Institute for AI, Ai2)的[负责任使用指南](https://allenai.org/responsible-use)。本数据集包含由第三方模型生成的输出数据,此类数据受其各自独立的使用条款约束。
提供机构:
maas
创建时间:
2025-05-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作