five

orpo-dpo-mix-40k-flat

收藏
魔搭社区2025-11-27 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/mlabonne/orpo-dpo-mix-40k-flat
下载链接
链接失效反馈
官方服务:
资源简介:
# ORPO-DPO-mix-40k-flat ![image/webp](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/s3uIwTgVl1sTm5_AX3rXH.webp) This dataset is designed for [ORPO](https://huggingface.co/docs/trl/main/en/orpo_trainer#expected-dataset-format) or [DPO](https://huggingface.co/docs/trl/main/en/dpo_trainer#expected-dataset-format) training. See [Uncensor any LLM with Abliteration](https://huggingface.co/blog/mlabonne/abliteration) for more information about how to use it. This is version with raw text instead of lists of dicts as in the original version [here](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k). It makes easier to parse in Axolotl, especially for DPO. ORPO-DPO-mix-40k-flat is a combination of the following high-quality DPO datasets: - [`argilla/Capybara-Preferences`](https://huggingface.co/datasets/argilla/Capybara-Preferences): highly scored chosen answers >=5 (7,424 samples) - [`argilla/distilabel-intel-orca-dpo-pairs`](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs): highly scored chosen answers >=9, not in GSM8K (2,299 samples) - [`argilla/ultrafeedback-binarized-preferences-cleaned`](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned): highly scored chosen answers >=5 (22,799 samples) - [`argilla/distilabel-math-preference-dpo`](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo): highly scored chosen answers >=9 (2,181 samples) - [`unalignment/toxic-dpo-v0.2`](https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2) (541 samples) - [`M4-ai/prm_dpo_pairs_cleaned`](https://huggingface.co/datasets/M4-ai/prm_dpo_pairs_cleaned) (7,958 samples) - [`jondurbin/truthy-dpo-v0.1`](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1) (1,016 samples) Rule-based filtering was applied to remove gptisms in the chosen answers (2,206 samples). Thanks to [argilla](https://huggingface.co/argilla), [unalignment](https://huggingface.co/unalignment), [M4-ai](https://huggingface.co/M4-ai), and [jondurbin](https://huggingface.co/jondurbin) for providing the source datasets. ## 🔎 Usage Here's an example on how to use it as a DPO dataset in Axolotl with ChatML: ```yaml rl: dpo chat_template: chatml datasets: - path: mlabonne/orpo-dpo-mix-40k type: chatml.intel ``` For ORPO, I recommend using [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) instead. ## Toxicity Note that ORPO-DPO-mix-40k-flat contains a dataset (`toxic-dpo-v0.2`) designed to prompt the model to answer illegal questions. You can remove it as follows: ```python dataset = load_dataset('mlabonne/orpo-mix-40k-flat', split='train') dataset = dataset.filter( lambda r: r["source"] != "toxic-dpo-v0.2" ) ```

# ORPO-DPO-mix-40k-flat ![image/webp](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/s3uIwTgVl1sTm5_AX3rXH.webp) 本数据集专为**ORPO**或**DPO**训练设计,相关说明可参阅[ORPO训练器](https://huggingface.co/docs/trl/main/en/orpo_trainer#expected-dataset-format)与[DPO训练器](https://huggingface.co/docs/trl/main/en/dpo_trainer#expected-dataset-format)的官方文档。如需了解其使用方法的更多细节,请参阅《借助Abliteration为任意大语言模型(LLM)解除内容审查》(https://huggingface.co/blog/mlabonne/abliteration)。 本版本采用原始文本格式,而非原始版本[此处](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k)中的字典列表格式,可在Axolotl训练框架中更便捷地进行解析,尤其适用于DPO训练场景。 ORPO-DPO-mix-40k-flat 由以下优质DPO数据集整合而成: - [`argilla/Capybara-Preferences`](https://huggingface.co/datasets/argilla/Capybara-Preferences):选取得分≥5的优质回复(共7,424条样本) - [`argilla/distilabel-intel-orca-dpo-pairs`](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs):选取得分≥9且未包含在GSM8K数据集中的优质回复(共2,299条样本) - [`argilla/ultrafeedback-binarized-preferences-cleaned`](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned):选取得分≥5的优质回复(共22,799条样本) - [`argilla/distilabel-math-preference-dpo`](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo):选取得分≥9的优质回复(共2,181条样本) - [`unalignment/toxic-dpo-v0.2`](https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2)(共541条样本) - [`M4-ai/prm_dpo_pairs_cleaned`](https://huggingface.co/datasets/M4-ai/prm_dpo_pairs_cleaned)(共7,958条样本) - [`jondurbin/truthy-dpo-v0.1`](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1)(共1,016条样本) 已通过基于规则的过滤操作,移除了回复中的GPT式表达(共2,206条样本)。 感谢[argilla](https://huggingface.co/argilla)、[unalignment](https://huggingface.co/unalignment)、[M4-ai](https://huggingface.co/M4-ai)及[jondurbin](https://huggingface.co/jondurbin)提供的原始数据集。 ## 🔎 使用方法 以下示例展示了如何在Axolotl中以ChatML格式将其用作DPO数据集: yaml rl: dpo chat_template: chatml datasets: - path: mlabonne/orpo-dpo-mix-40k type: chatml.intel 若需使用ORPO训练,建议改用[mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k)数据集。 ## 毒性说明 请注意,ORPO-DPO-mix-40k-flat 包含一个名为`toxic-dpo-v0.2`的数据集,其设计目的是诱导模型回答违规问题。你可通过以下方式移除该数据集: python dataset = load_dataset('mlabonne/orpo-mix-40k-flat', split='train') dataset = dataset.filter( lambda r: r["source"] != "toxic-dpo-v0.2" )
提供机构:
maas
创建时间:
2025-03-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作