imdatta0/qwen3-4b-2048-linear-grpo-bank
收藏Hugging Face2026-04-26 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/imdatta0/qwen3-4b-2048-linear-grpo-bank
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: prompt
dtype: string
- name: legal_moves
dtype: string
- name: action_rewards
dtype: string
- name: action_advantages
dtype: string
- name: best_action
dtype: string
- name: board_flat
dtype: string
- name: score_before
dtype: int64
- name: moves_before
dtype: int64
- name: max_tile_before
dtype: int64
- name: empty_before
dtype: int64
- name: source
dtype: string
- name: stage_bucket
dtype: string
- name: prompt_style
dtype: string
splits:
- name: train
num_bytes: 439488
num_examples: 640
- name: all_generated
num_bytes: 1341032
num_examples: 1952
- name: selection_64
num_bytes: 44024
num_examples: 64
- name: promotion_256
num_bytes: 175998
num_examples: 256
download_size: 524356
dataset_size: 2000542
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: all_generated
path: data/all_generated-*
- split: selection_64
path: data/selection_64-*
- split: promotion_256
path: data/promotion_256-*
---
提供机构:
imdatta0



