openbmb/RLPR-Evaluation
收藏Hugging Face2025-07-11 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/openbmb/RLPR-Evaluation
下载链接
链接失效反馈官方服务:
资源简介:
RLPR-Evaluation是一个用于评估RLPR框架的综合性推理能力的数据集,包含七个基准,涵盖数学推理和一般领域推理任务。这些基准包括MATH-500、Minerva、AIME24、MMLU-Pro、GPQA、TheoremQA和WebInstruct Validation Split,旨在全面评估不同领域和难度级别的推理能力。
RLPR-Evaluation is a comprehensive dataset for evaluating the reasoning capabilities of the RLPR framework, consisting of seven benchmarks that cover both mathematical reasoning and general domain reasoning tasks. These benchmarks include MATH-500, Minerva, AIME24, MMLU-Pro, GPQA, TheoremQA, and WebInstruct Validation Split, aiming to thoroughly evaluate reasoning across diverse domains and difficulty levels.
提供机构:
openbmb



