sail/Sanity-Test-R1D-1.5B

Name: sail/Sanity-Test-R1D-1.5B
Creator: sail
Published: 2025-11-15 08:21:50
License: 暂无描述

Hugging Face2025-11-15 更新2025-11-15 收录

下载链接：

https://hf-mirror.com/datasets/sail/Sanity-Test-R1D-1.5B

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集用于对`DeepSeek-R1-Distill-Qwen-1.5B`模型在8k上下文长度下的强化学习算法进行健全性测试，旨在确保算法能够在BF16精度下达到至少95%的训练准确度，以及在FP16精度下达到至少98%的训练准确度。该数据集被用于支持研究论文《Defeating the Training-Inference Mismatch via FP16》的实验。

This dataset is used for sanity testing of RL algorithms on the `DeepSeek-R1-Distill-Qwen-1.5B` model under 8k context length, aiming to ensure that the algorithms can achieve at least 95% training accuracy under BF16 precision and at least 98% under FP16 precision. This dataset is used to support the experiments in the research paper Defeating the Training-Inference Mismatch via FP16.

提供机构：

sail

5,000+

优质数据集

54 个

任务类型

进入经典数据集