arvindcr4/tinker-rl-bench-wandb

Name: arvindcr4/tinker-rl-bench-wandb
Creator: arvindcr4
Published: 2026-04-19 15:58:33
License: 暂无描述

Hugging Face2026-04-19 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/arvindcr4/tinker-rl-bench-wandb

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation - reinforcement-learning tags: - wandb - rlhf - grpo - ppo - reinforcement-learning-from-human-feedback - training-logs language: - en pretty_name: TinkerRL-Bench W&B Run Archive size_categories: - 1K<n<10K configs: - config_name: runs data_files: runs.jsonl - config_name: history data_files: history.jsonl --- # TinkerRL-Bench W&B Run Archive Full export of every Weights & Biases run under the `arvindcr4-pes-university` entity, covering the experiments reported in our NeurIPS submission *"A Unified Benchmark for RL Post-Training of Language Models"* ([repo](https://github.com/pes-llm-research/tinker-rl-lab)). ## Contents | File | Rows | Description | |------|------|-------------| | `runs.jsonl` | 334 | One record per run: `project`, `run_id`, `run_name`, `state`, `config`, `summary`, `tags`, `url`, `runtime` | | `history.jsonl` | 9,255 | Per-step metric history (step, reward, loss, accuracy, etc.) joined to `run_id` | ## Projects covered | Project | Runs | What it contains | |---|---|---| | `tinker-rl-lab-world-class` | 171 | Frontier/architectural GSM8K campaigns (Kimi-K2, GPT-OSS-20B, Qwen3-235B, DeepSeek-V3.1, Nemotron-120B, Llama-8B-Instruct, MoE variants) | | `tinker-structural-ceiling` | 72 | Structural-ceiling sweep across Qwen3 / Llama / Gemma base + instruct, learning-rate and group-size ablations | | `tinker-rl-scaling` | 88 | Scaling / seed ablations of Qwen3 {0.6B, 1.7B, 4B, 8B, 14B, 30B-MoE} on GSM8K | | `skyrl-tinker` | 3 | Qwen3-8B tool-use SkyRL runs | ## How to load ```python from datasets import load_dataset runs = load_dataset("arvindcr4/tinker-rl-bench-wandb", "runs", split="train") history = load_dataset("arvindcr4/tinker-rl-bench-wandb", "history", split="train") # e.g. last-10-avg reward for each finished run import pandas as pd df_h = history.to_pandas() df_h = df_h.dropna(subset=["_step", "reward"]) per_run = df_h.sort_values("_step").groupby("run_id").tail(10) \ .groupby("run_id")["reward"].mean() ``` ## Citation ```bibtex @misc{tinkerrlbench2026, title = {A Unified Benchmark for RL Post-Training of Language Models}, author = {Arvind, C. R. and Jeyaraj, Sandhya}, year = {2026}, note = {NeurIPS submission, see https://github.com/pes-llm-research/tinker-rl-lab} } ``` ## License Apache 2.0.

提供机构：

arvindcr4

5,000+

优质数据集

54 个

任务类型

进入经典数据集