Magpie-Pro-DPO-200K
收藏魔搭社区2026-01-02 更新2025-01-18 收录
下载链接:
https://modelscope.cn/datasets/Magpie-Align/Magpie-Pro-DPO-200K
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is still under internal assessment. Please use it with caution!
To create this dataset, we carefully selected a diverse range of high-quality instructions from Magpie datasets, with a particular emphasis on Math and Coding tasks. We then generate responses from the Llama-3 base model using URIAL as rejected. Then, we generate responses from Qwen2-72B-Instruct and Llama-3-8B-Instruct and take the instruction-response pair as chosen.
### Other Magpie DPO Datasets
We observed that the following DPO datasets may have better performance after we burned a lot of GPU hours :)
|Model Name | Dataset | Type | Description |
|-------------|:-------|:-------|:-------|
| [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | [Magpie-Air-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1) | DPO | DPO dataset via Best-of-N sampling and rewards.
| [Llama 3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | [Magpie-Pro-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1) | DPO | DPO dataset via Best-of-N sampling and rewards.
| [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | [Magpie-Llama-3.1-Pro-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1) | DPO | DPO dataset via Best-of-N sampling and rewards.
本数据集仍处于内部评估阶段,请谨慎使用!
为构建本数据集,我们从Magpie数据集系列中精心筛选了多样化的高质量指令,尤其侧重数学与编码类任务。随后,我们以URIAL生成的回复作为拒绝样本(负样本),由Llama-3基础模型生成回复;同时由通义千问2(Qwen2)-72B-Instruct与Llama-3-8B-Instruct生成回复,并将该类指令-回复对作为优选样本(正样本)。
### 其他Magpie DPO数据集
我们发现,在耗费大量GPU计算资源后,以下DPO数据集可实现更优性能:
| 模型名称 | 数据集 | 类型 | 描述 |
|:-------------|:-------|:-------|:-------|
| [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | [Magpie-Air-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1) | DPO | 基于最优N采样与奖励机制构建的DPO数据集。
| [Llama 3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | [Magpie-Pro-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1) | DPO | 基于最优N采样与奖励机制构建的DPO数据集。
| [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | [Magpie-Llama-3.1-Pro-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1) | DPO | 基于最优N采样与奖励机制构建的DPO数据集。
提供机构:
maas
创建时间:
2025-01-15



