Magpie-Pro-DPO-200K

Name: Magpie-Pro-DPO-200K
Creator: maas
Published: 2026-01-02 16:20:35
License: 暂无描述

魔搭社区2026-01-02 更新2025-01-18 收录

下载链接：

https://modelscope.cn/datasets/Magpie-Align/Magpie-Pro-DPO-200K

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset is still under internal assessment. Please use it with caution! To create this dataset, we carefully selected a diverse range of high-quality instructions from Magpie datasets, with a particular emphasis on Math and Coding tasks. We then generate responses from the Llama-3 base model using URIAL as rejected. Then, we generate responses from Qwen2-72B-Instruct and Llama-3-8B-Instruct and take the instruction-response pair as chosen. ### Other Magpie DPO Datasets We observed that the following DPO datasets may have better performance after we burned a lot of GPU hours :) |Model Name | Dataset | Type | Description | |-------------|:-------|:-------|:-------| | [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | [Magpie-Air-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1) | DPO | DPO dataset via Best-of-N sampling and rewards. | [Llama 3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | [Magpie-Pro-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1) | DPO | DPO dataset via Best-of-N sampling and rewards. | [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | [Magpie-Llama-3.1-Pro-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1) | DPO | DPO dataset via Best-of-N sampling and rewards.

本数据集仍处于内部评估阶段，请谨慎使用！为构建本数据集，我们从Magpie数据集系列中精心筛选了多样化的高质量指令，尤其侧重数学与编码类任务。随后，我们以URIAL生成的回复作为拒绝样本（负样本），由Llama-3基础模型生成回复；同时由通义千问2（Qwen2）-72B-Instruct与Llama-3-8B-Instruct生成回复，并将该类指令-回复对作为优选样本（正样本）。 ### 其他Magpie DPO数据集我们发现，在耗费大量GPU计算资源后，以下DPO数据集可实现更优性能： | 模型名称 | 数据集 | 类型 | 描述 | |:-------------|:-------|:-------|:-------| | [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | [Magpie-Air-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1) | DPO | 基于最优N采样与奖励机制构建的DPO数据集。 | [Llama 3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | [Magpie-Pro-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1) | DPO | 基于最优N采样与奖励机制构建的DPO数据集。 | [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | [Magpie-Llama-3.1-Pro-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1) | DPO | 基于最优N采样与奖励机制构建的DPO数据集。

提供机构：

maas

创建时间：

2025-01-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集