RLVR-SvS/Variational-DAPO
收藏Hugging Face2025-08-23 更新2025-11-30 收录
下载链接:
https://hf-mirror.com/datasets/RLVR-SvS/Variational-DAPO
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了314k个通过Qwen2.5-32B-Instruct策略在DAPO-17k数据集上进行600步RLVR训练合成的变体问题,每个问题都配有参考答案。数据集使用了min_hash去重算法,阈值为0.85,以保证问题的多样性。
This dataset consists of 314k variational problems synthesized by the Qwen2.5-32B-Instruct policy during 600-step RLVR training on DAPO-17k, each accompanied by reference answers. The dataset uses min_hash deduplication with a threshold of 0.85 to ensure diversity among the problems.
提供机构:
RLVR-SvS



