five

rhamvjaja/Andre

收藏
Hugging Face2025-12-10 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/rhamvjaja/Andre
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit --- This dataset is designed as a sanity test on RL algorithms for [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model under 8k context length. A reliable RL algorithm should achieve above 95% training accuracy for BF16 and 98% for FP16. This is the dataset used in the paper: Defeating the Training-Inference Mismatch via FP16. For more details: https://arxiv.org/pdf/2510.26788 Reproducible code: https://github.com/sail-sg/Precision-RL We construct this dataset by filtering out those overly trivial and unsolvable questions for the initial model. Specifically, we unroll 40 responses for each problem in the MATH dataset, and only keep problems where the initial accuracy is between 20\% and 80\%. This process yielded a targeted dataset of 1,460 questions for the DeepSeek-R1-Distill-Qwen-1.5B model. The smaller size of this dataset makes achieving near-100\% accuracy computationally feasible, allowing for efficient and conclusive testing. ## Citation If you find this dataset helpful in your research, please consider citing: ``` @article{qi2025precisionrl, title={Defeating the Training-Inference Mismatch via FP16}, author={Qi, Penghui and Liu, Zichen and Zhou, Xiangxin and Pang, Tianyu and Du, Chao and Lee, Wee Sun and Lin, Min}, journal={arXiv preprint arXiv:2510.26788}, year={2025} } ```
提供机构:
rhamvjaja
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作