rhamvjaja/Andre

Name: rhamvjaja/Andre
Creator: rhamvjaja
Published: 2025-12-10 06:49:39
License: 暂无描述

Hugging Face2025-12-10 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/rhamvjaja/Andre

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit --- This dataset is designed as a sanity test on RL algorithms for [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model under 8k context length. A reliable RL algorithm should achieve above 95% training accuracy for BF16 and 98% for FP16. This is the dataset used in the paper: Defeating the Training-Inference Mismatch via FP16. For more details: https://arxiv.org/pdf/2510.26788 Reproducible code: https://github.com/sail-sg/Precision-RL We construct this dataset by filtering out those overly trivial and unsolvable questions for the initial model. Specifically, we unroll 40 responses for each problem in the MATH dataset, and only keep problems where the initial accuracy is between 20\% and 80\%. This process yielded a targeted dataset of 1,460 questions for the DeepSeek-R1-Distill-Qwen-1.5B model. The smaller size of this dataset makes achieving near-100\% accuracy computationally feasible, allowing for efficient and conclusive testing. ## Citation If you find this dataset helpful in your research, please consider citing: ``` @article{qi2025precisionrl, title={Defeating the Training-Inference Mismatch via FP16}, author={Qi, Penghui and Liu, Zichen and Zhou, Xiangxin and Pang, Tianyu and Du, Chao and Lee, Wee Sun and Lin, Min}, journal={arXiv preprint arXiv:2510.26788}, year={2025} } ```

提供机构：

rhamvjaja

5,000+

优质数据集

54 个

任务类型

进入经典数据集