five

orpo-dpo-mix-40k

收藏
魔搭社区2026-05-20 更新2024-06-01 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/orpo-dpo-mix-40k
下载链接
链接失效反馈
官方服务:
资源简介:
# ORPO-DPO-mix-40k v1.2 ![image/webp](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/s3uIwTgVl1sTm5_AX3rXH.webp) This dataset is designed for [ORPO](https://huggingface.co/docs/trl/main/en/orpo_trainer#expected-dataset-format) or [DPO](https://huggingface.co/docs/trl/main/en/dpo_trainer#expected-dataset-format) training. See [Fine-tune Llama 3 with ORPO](https://huggingface.co/blog/mlabonne/orpo-llama-3) for more information about how to use it. It is a combination of the following high-quality DPO datasets: - [`argilla/Capybara-Preferences`](https://huggingface.co/datasets/argilla/Capybara-Preferences): highly scored chosen answers >=5 (7,424 samples) - [`argilla/distilabel-intel-orca-dpo-pairs`](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs): highly scored chosen answers >=9, not in GSM8K (2,299 samples) - [`argilla/ultrafeedback-binarized-preferences-cleaned`](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned): highly scored chosen answers >=5 (22,799 samples) - [`argilla/distilabel-math-preference-dpo`](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo): highly scored chosen answers >=9 (2,181 samples) - [`unalignment/toxic-dpo-v0.2`](https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2) (541 samples) - [`M4-ai/prm_dpo_pairs_cleaned`](https://huggingface.co/datasets/M4-ai/prm_dpo_pairs_cleaned) (7,958 samples) - [`jondurbin/truthy-dpo-v0.1`](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1) (1,016 samples) Rule-based filtering was applied to remove gptisms in the chosen answers (2,206 samples). Thanks to [argilla](https://huggingface.co/argilla), [unalignment](https://huggingface.co/unalignment), [M4-ai](https://huggingface.co/M4-ai), and [jondurbin](https://huggingface.co/jondurbin) for providing the source datasets. ## 🔎 Usage v1.2 adds a `question` column to ensure compatibility with both DPO and ORPO formats in Axolotl. Here's an example as an ORPO dataset in Axolotl: ```yaml rl: orpo orpo_alpha: 0.1 chat_template: chatml datasets: - path: mlabonne/orpo-dpo-mix-40k type: chat_template.argilla chat_template: chatml ``` For DPO, I recommend using [mlabonne/orpo-dpo-mix-40k-flat](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k-flat) instead. ## Toxicity Note that ORPO-DPO-mix-40k contains a dataset (`toxic-dpo-v0.2`) designed to prompt the model to answer illegal questions. You can remove it as follows: ```python dataset = load_dataset('mlabonne/orpo-dpo-mix-40k', split='train') dataset = dataset.filter( lambda r: r["source"] != "toxic-dpo-v0.2" ) ``` ## History I'm saving previous versions of this dataset in different branches. - [v1.0](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k/tree/v1.0)

# ORPO-DPO混合40K数据集 v1.2 ![image/webp](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/s3uIwTgVl1sTm5_AX3rXH.webp) 本数据集专为**赔率比偏好优化(Odds Ratio Preference Optimization,简称ORPO)**或**直接偏好优化(Direct Preference Optimization,简称DPO)**训练场景设计。如需了解详细使用方法,请参考[《基于ORPO微调Llama 3》](https://huggingface.co/blog/mlabonne/orpo-llama-3)一文。 本数据集由以下高质量DPO数据集混合构建而成: - [`argilla/Capybara-Preferences`](https://huggingface.co/datasets/argilla/Capybara-Preferences):选取评分≥5的优质回复(共7424条样本) - [`argilla/distilabel-intel-orca-dpo-pairs`](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs):选取评分≥9且未出现在GSM8K中的优质回复(共2299条样本) - [`argilla/ultrafeedback-binarized-preferences-cleaned`](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned):选取评分≥5的优质回复(共22799条样本) - [`argilla/distilabel-math-preference-dpo`](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo):选取评分≥9的优质回复(共2181条样本) - [`unalignment/toxic-dpo-v0.2`](https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2)(共541条样本) - [`M4-ai/prm_dpo_pairs_cleaned`](https://huggingface.co/datasets/M4-ai/prm_dpo_pairs_cleaned)(共7958条样本) - [`jondurbin/truthy-dpo-v0.1`](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1)(共1016条样本) 我们通过基于规则的过滤流程,移除了回复中的GPT式话术(共2206条样本)。 感谢[argilla](https://huggingface.co/argilla)、[unalignment](https://huggingface.co/unalignment)、[M4-ai](https://huggingface.co/M4-ai)以及[jondurbin](https://huggingface.co/jondurbin)提供原始数据集。 ## 🔎 使用说明 v1.2版本新增了`question`字段,以确保在Axolotl中兼容DPO与ORPO两种训练格式。 以下为Axolotl中使用该数据集作为ORPO训练源的示例配置: yaml rl: orpo orpo_alpha: 0.1 chat_template: chatml datasets: - path: mlabonne/orpo-dpo-mix-40k type: chat_template.argilla chat_template: chatml 若需使用DPO训练,推荐改用[mlabonne/orpo-dpo-mix-40k-flat](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k-flat)数据集。 ## 毒性说明 请注意,ORPO-DPO-mix-40k数据集包含`toxic-dpo-v0.2`子集,该子集用于诱导模型回复非法问题。如需移除该子集,可执行以下代码: python dataset = load_dataset('mlabonne/orpo-dpo-mix-40k', split='train') dataset = dataset.filter( lambda r: r["source"] != "toxic-dpo-v0.2" ) ## 版本历史 本数据集的历史版本已保存至不同分支。 - [v1.0](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k/tree/v1.0)
提供机构:
maas
创建时间:
2024-05-28
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集专为ORPO或DPO训练设计,由多个高质量DPO数据集组合而成,包含约40k个样本,并经过规则过滤以去除GPT风格内容。它添加了'question'列以确保与DPO和ORPO格式的兼容性,适用于如Axolotl等训练框架。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作