qihoo360/Light-R1-DPOData
收藏Hugging Face2025-03-17 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/qihoo360/Light-R1-DPOData
下载链接
链接失效反馈官方服务:
资源简介:
Light-R1是一个基于课程SFT和DPO方法训练的长COT模型,它在数学任务上的表现超过了DeepSeek-R1-Distill-Qwen-32B。该模型从 scratch 开始训练,使用了经过净化处理的数学数据集,并通过合并多个模型的方法进一步提升了性能。
Light-R1 is a long COT model trained with curriculum SFT and DPO methods, which outperforms DeepSeek-R1-Distill-Qwen-32B on math tasks. The model is trained from scratch using decontaminated math datasets and further improved by merging multiple models.
提供机构:
qihoo360



