sail/Sanity-Test-R1D-1.5B
收藏Hugging Face2025-11-15 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/sail/Sanity-Test-R1D-1.5B
下载链接
链接失效反馈官方服务:
资源简介:
该数据集用于对`DeepSeek-R1-Distill-Qwen-1.5B`模型在8k上下文长度下的强化学习算法进行健全性测试,旨在确保算法能够在BF16精度下达到至少95%的训练准确度,以及在FP16精度下达到至少98%的训练准确度。该数据集被用于支持研究论文《Defeating the Training-Inference Mismatch via FP16》的实验。
This dataset is used for sanity testing of RL algorithms on the `DeepSeek-R1-Distill-Qwen-1.5B` model under 8k context length, aiming to ensure that the algorithms can achieve at least 95% training accuracy under BF16 precision and at least 98% under FP16 precision. This dataset is used to support the experiments in the research paper Defeating the Training-Inference Mismatch via FP16.
提供机构:
sail



