RyanYr/grpo-dapo_shuffled-01_offline-grpo-dapo-qwen3-4B-Base-mbs128-n4_llama3.1-8B-It_matheval

Name: RyanYr/grpo-dapo_shuffled-01_offline-grpo-dapo-qwen3-4B-Base-mbs128-n4_llama3.1-8B-It_matheval
Creator: RyanYr
Published: 2026-04-29 06:26:16
License: 暂无描述

Hugging Face2026-04-29 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/RyanYr/grpo-dapo_shuffled-01_offline-grpo-dapo-qwen3-4B-Base-mbs128-n4_llama3.1-8B-It_matheval

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个用于AI模型训练和评估的结构化数据集，包含多个特征字段：data_source（数据来源）、problem（问题描述）、solution（解决方案）、answer（答案）、prompt（提示信息，由角色和内容组成）、reward_model（奖励模型信息，包括ground_truth和style）和responses（响应列表）。数据集分为多个分片，包括mixed.10至mixed.90和hard.10至hard.90，可能表示不同难度级别或数据混合比例的子集，每个分片有特定的字节大小和示例数量。整体上，数据集旨在支持自然语言处理任务，如问题解答、对话生成和奖励建模。

This dataset is a structured dataset for AI model training and evaluation, featuring multiple fields: data_source (data source), problem (problem description), solution (solution), answer (answer), prompt (prompt information, consisting of role and content), reward_model (reward model information, including ground_truth and style), and responses (list of responses). The dataset is divided into multiple splits, such as mixed.10 to mixed.90 and hard.10 to hard.90, likely representing subsets with different difficulty levels or data mixing ratios, each with specific byte sizes and example counts. Overall, the dataset is designed to support natural language processing tasks, such as question answering, dialogue generation, and reward modeling.

提供机构：

RyanYr