five

Magpie-DPO-100K-SML

收藏
魔搭社区2025-11-27 更新2025-01-18 收录
下载链接:
https://modelscope.cn/datasets/Magpie-Align/Magpie-DPO-100K-SML
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is still under internal assessment. Please use it with caution! To create this dataset, we first generate responses from the base model using URIAL as rejected. Then, we generate responses from 8B, 70B, and 405B models, and take the instruction-response pair with the highest reward as chosen. ### Other Magpie DPO Datasets We observed that the following DPO datasets may have better performance after we burned a lot of GPU hours :) |Model Name | Dataset | Type | Description | |-------------|:-------|:-------|:-------| | [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | [Magpie-Air-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1) | DPO | DPO dataset via Best-of-N sampling and rewards. | [Llama 3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | [Magpie-Pro-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1) | DPO | DPO dataset via Best-of-N sampling and rewards. | [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | [Magpie-Llama-3.1-Pro-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1) | DPO | DPO dataset via Best-of-N sampling and rewards.

本数据集目前仍处于内部评估阶段,请谨慎使用! 为构建本数据集,我们首先使用URIAL生成基础模型的回复作为拒绝样本(rejected);随后,我们分别从8B、70B及405B参数的模型中生成回复,并选取奖励分值最高的指令-回复对作为优选样本(chosen)。 ### 其他Magpie DPO数据集 我们在耗费大量GPU计算资源后发现,以下直接偏好优化(Direct Preference Optimization,DPO)数据集可实现更优性能:) | 模型名称 | 数据集 | 类型 | 描述 | |-------------|:-------|:-------|:-------| | [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | [Magpie-Air-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-DPO-100K-v0.1) | 直接偏好优化(Direct Preference Optimization,DPO) | 基于最优N采样与奖励机制构建的DPO数据集。 | [Llama 3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | [Magpie-Pro-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-DPO-100K-v0.1) | DPO | 基于最优N采样与奖励机制构建的DPO数据集。 | [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | [Magpie-Llama-3.1-Pro-DPO-100K](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1) | DPO | 基于最优N采样与奖励机制构建的DPO数据集。
提供机构:
maas
创建时间:
2025-01-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作