sarosavo/RLEV
收藏Hugging Face2025-10-27 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/sarosavo/RLEV
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了约10万个真实考试问题及其答案和真实价值标签,同时提供中英文两个版本的数据。这些问题用于强化学习,通过直接将人类定义的价值信号整合到奖励函数中,来优化大型语言模型的训练,与仅基于正确性的基线相比,该方法在各种RL算法和模型规模上均有显著提升。
This dataset contains approximately 100,000 real exam questions with their answers and actual value labels, provided in both English and Chinese versions. These questions are used for reinforcement learning, optimizing the training of large language models by integrating human-defined value signals directly into the reward function, significantly outperforming correctness-only baselines across various RL algorithms and model scales.
提供机构:
sarosavo



