RyanYr/grpo-dapo-qwen3-4B-Base-mbs128-n4_mmlupro
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/RyanYr/grpo-dapo-qwen3-4B-Base-mbs128-n4_mmlupro
下载链接
链接失效反馈官方服务:
资源简介:
这是一个用于奖励模型评估的数据集,包含结构化特征:提示(由角色和内容组成)、数据源、奖励模型信息(包括真实值和风格)以及多个响应。数据集被分割为10个测试子集(从test.10到test.100),每个子集包含12032个示例,适用于测试和评估模型在不同条件下的性能表现。
This is a dataset for reward model evaluation, containing structured features: prompt (composed of role and content), data source, reward model information (including ground truth and style), and multiple responses. The dataset is divided into 10 test subsets (from test.10 to test.100), each with 12032 examples, suitable for testing and evaluating model performance under different conditions.
提供机构:
RyanYr



