five

Magpie-Pro-DPO-200K

收藏
魔搭社区2026-01-02 更新2025-01-18 收录
下载链接:
https://modelscope.cn/datasets/Magpie-Align/Magpie-Pro-DPO-200K
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is still under internal assessment. Please use it with caution! To create this dataset, we carefully selected a diverse range of high-quality instructions from Magpie datasets, with a particular emphasis on Math and Coding tasks. We then generate responses from the Llama-3 base model using URIAL as rejected. Then, we generate responses from Qwen2-72B-Instruct and Llama-3-8B-Instruct and take the instruction-response pair as chosen. ### Other Magpie DPO Datasets We observed that the following DPO datasets may have better performance after we burned a lot of GPU hours :) |Model Name | Dataset | Type | Description | |-------------|:-------|:-------|:-------| | [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | [Magpie-Air-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1) | DPO | DPO dataset via Best-of-N sampling and rewards. | [Llama 3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | [Magpie-Pro-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1) | DPO | DPO dataset via Best-of-N sampling and rewards. | [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | [Magpie-Llama-3.1-Pro-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1) | DPO | DPO dataset via Best-of-N sampling and rewards.

本数据集仍处于内部评估阶段,请谨慎使用! 为构建本数据集,我们从Magpie数据集系列中精心筛选了多样化的高质量指令,尤其侧重数学与编码类任务。随后,我们以URIAL生成的回复作为拒绝样本(负样本),由Llama-3基础模型生成回复;同时由通义千问2(Qwen2)-72B-Instruct与Llama-3-8B-Instruct生成回复,并将该类指令-回复对作为优选样本(正样本)。 ### 其他Magpie DPO数据集 我们发现,在耗费大量GPU计算资源后,以下DPO数据集可实现更优性能: | 模型名称 | 数据集 | 类型 | 描述 | |:-------------|:-------|:-------|:-------| | [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | [Magpie-Air-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1) | DPO | 基于最优N采样与奖励机制构建的DPO数据集。 | [Llama 3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | [Magpie-Pro-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1) | DPO | 基于最优N采样与奖励机制构建的DPO数据集。 | [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | [Magpie-Llama-3.1-Pro-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1) | DPO | 基于最优N采样与奖励机制构建的DPO数据集。
提供机构:
maas
创建时间:
2025-01-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作