qihoo360/Light-R1-DPOData

Name: qihoo360/Light-R1-DPOData
Creator: qihoo360
Published: 2025-03-17 03:41:26
License: 暂无描述

Hugging Face2025-03-17 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/qihoo360/Light-R1-DPOData

下载链接

链接失效反馈

官方服务：

资源简介：

Light-R1是一个基于课程SFT和DPO方法训练的长COT模型，它在数学任务上的表现超过了DeepSeek-R1-Distill-Qwen-32B。该模型从 scratch 开始训练，使用了经过净化处理的数学数据集，并通过合并多个模型的方法进一步提升了性能。

Light-R1 is a long COT model trained with curriculum SFT and DPO methods, which outperforms DeepSeek-R1-Distill-Qwen-32B on math tasks. The model is trained from scratch using decontaminated math datasets and further improved by merging multiple models.

提供机构：

qihoo360

5,000+

优质数据集

54 个

任务类型

进入经典数据集