open-athena/isoflop-experiments

Name: open-athena/isoflop-experiments
Creator: open-athena
Published: 2026-03-27 17:44:03
License: 暂无描述

Hugging Face2026-03-27 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/open-athena/isoflop-experiments

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - other --- # IsoFLOP Scaling Law Experiments Curated collection of IsoFLOP curve data from 6 experiments, standardized to a common schema. This dataset is associated with the paper [Problems with Chinchilla Approach 2: Systematic Biases in IsoFLOP Parabola Fits](https://huggingface.co/papers/2603.22339). - **Project Page:** [https://openathena.ai/scaling-law-analysis](https://openathena.ai/scaling-law-analysis/) - **Data Extraction & Prep:** [Open-Athena/scaling-law-analysis](https://github.com/Open-Athena/scaling-law-analysis) - **Scaling Law Estimation:** [Open-Athena/vpnls](https://github.com/Open-Athena/vpnls) ## Schema | Field | Type | Description | |---|---|---| | `source` | string | Data source identifier. One of: `ml_scalefit`, `epochai_chinchilla`, `llama_3`, `marin_202603`, `misfitting`. | | `dataset` | string | Training dataset. One of: `massivetext`, `llama_3`, `comma`, `dclm`, `nemotron`, `fineweb_c4`. | | `model` | string | Model architecture. One of: `chinchilla`, `llama_3`, `llama_2`, `transformer`. | | `experiment` | string | Canonical identifier. Defined as `source__dataset__model` with deduplication. | | `tokens` | float | Training tokens (`D`). Either from the source data or derived via `D = C / (6N)`. | | `params` | float | Model parameter count (`N`). Either from the source data or derived via `N = C / (6D)`. | | `budget` | float | Compute budget in FLOPs (`C`). | | `loss` | float | Validation loss. Underlying source varies by experiment. | Each row is uniquely identified by `(experiment, tokens, params, budget)`. ## Usage Fit Chinchilla scaling law parameters from this dataset using [vpnls](https://github.com/Open-Athena/vpnls): ```python from datasets import load_dataset from vpnls.api import fit_vpnls N, D, L = ( load_dataset('open-athena/isoflop-experiments', split='train').to_pandas() .query("experiment == 'ml_scalefit__massivetext__chinchilla'") .filter(items=['params', 'tokens', 'loss']).values.copy().T ) result = fit_vpnls(N, D, L) print(f'α={result.alpha:.4f}, β={result.beta:.4f}, E={result.E:.4f}, A={result.A:.4f}, B={result.B:.4f}') # α=0.3900, β=0.4300, E=1.9160, A=999.8009, B=7944.6131 ``` See [vpnls#usage](https://github.com/Open-Athena/vpnls?tab=readme-ov-file#usage) for more examples. ## Summary | Experiment | Points | Budgets | Reference | Collection Method | |---|---|---|---|---| | `ml_scalefit__massivetext__chinchilla` | 124 | 9 | [arxiv:2507.09404](https://arxiv.org/abs/2507.09404) | GitHub CSV | | `epochai_chinchilla__massivetext__chinchilla` | 123 | 9 | [arxiv:2404.10102](https://arxiv.org/abs/2404.10102) | SVG digitization | | `llama_3` | 133 | 10 | [arxiv:2407.21783](https://arxiv.org/abs/2407.21783) | SVG digitization | | `marin_202603__comma__llama_2` | 85 | 7 | [W&B report](https://wandb.ai/marin-community/marin/reports/Scaling-Ladders--VmlldzoxNTc0MjM1NQ) | W&B export | | `marin_202603__dclm__llama_2` | 85 | 7 | [W&B report](https://wandb.ai/marin-community/marin/reports/Scaling-Ladders--VmlldzoxNTc0MjM1NQ) | W&B export | | `marin_202603__nemotron__llama_2` | 88 | 8 | [W&B report](https://wandb.ai/marin-community/marin/reports/Scaling-Ladders--VmlldzoxNTc0MjM1NQ) | W&B export | | `misfitting__fineweb_c4__transformer` | 176 | 26 | [arxiv:2502.18969](https://arxiv.org/abs/2502.18969) | Checkpoint interpolation | | **Total** | **814** | | | | ## Experiment Details ### ml_scalefit Chinchilla training data from Besiroglu et al. ([arxiv:2507.09404](https://arxiv.org/abs/2507.09404)). Raw data: [`apple/ml-scalefit/data/chinchilla.csv`](https://github.com/apple/ml-scalefit/blob/ac4664af5db6c94e6ac7521a61dd3bbb0d91cc3a/data/chinchilla.csv) with columns `model_size` (`N`), `n_tokens` (`D`), `loss`. Budget `C = 6ND` is computed and snapped to the 9 Chinchilla IsoFLOP levels (`6e18` to `3e21`); points >10% from the nearest budget are discarded. `N`, `D`, and loss are kept as-is. ### epochai_chinchilla Independent extraction of the same Chinchilla experiments by Besiroglu et al. ([arxiv:2404.10102](https://arxiv.org/abs/2404.10102)), digitized from SVG figures in the original paper. Raw data: [`epoch-research/analyzing-chinchilla/data/svg_extracted_data.csv`](https://github.com/epoch-research/analyzing-chinchilla/blob/92258837425e1b5f2851d624287f0120583a3d0e/data/svg_extracted_data.csv) with columns `Model Size` (`N`), `Training FLOP` (`C`), `loss`. `N` and `C` are rounded to integers (SVG artifact). `C` is snapped to the same 9 budgets as ml_scalefit; near-duplicates from SVG extraction are resolved by keeping the point closest to the target budget. `D` is derived as `C / (6N)`. ### llama_3 Digitized from SVG figures in the Llama 3 technical report ([arxiv:2407.21783](https://arxiv.org/abs/2407.21783)). Raw data: [`eric-czech/llama3_isoflop_extraction/isoflops_points.csv`](https://github.com/eric-czech/llama3_isoflop_extraction/blob/1bc1755b76e6ee55a911549c8ec52b71cb480320/isoflops_points.csv) with columns `compute_budget` (`C`), `training_tokens` (`D`), `validation_loss`. `N` is derived as `C / (6D)`. ### misfitting Scaling law survey data from Marghi et al. ([arxiv:2502.18969](https://arxiv.org/abs/2502.18969)). Transformers trained on FineWeb, evaluated on C4. Raw data: [`hadasah/scaling_laws/data/scaling_results.csv`](https://github.com/hadasah/scaling_laws/blob/1f3708c0a12df0effb0ee906b1da5f9f0ff4f4f1/data/scaling_results.csv) — per-checkpoint training logs. IsoFLOP curves are constructed by: (1) building a grid of 40 log-spaced budget candidates, keeping levels where ≥3 model sizes have data within 10% FLOP tolerance; (2) interpolating each run's loss at target budgets via log-log interpolation over nearby checkpoints; (3) selecting the best learning rate per model size. `D` is derived from the target budget. Follows the interpolation approach in `hadasah/scaling_laws/paper_analysis_and_plots.py`. ### marin_202603 Marin community scaling ladder experiments: Llama 2 models trained on three datasets (Comma, DCLM, Nemotron). Raw data: vendored CSVs exported from the [Marin W&B project](https://wandb.ai/marin-community/marin/reports/Scaling-Ladders--VmlldzoxNTc0MjM1NQ). Budget is parsed from run names and multiplied by 3 to convert from forward-pass FLOPs (`≈2ND`) to total FLOPs (`≈6ND`); this factor was validated empirically across all runs. "Validation-optimal" runs (which use a different FLOPs convention) are excluded. Loss is `eval/paloma/macro_loss`. ## Citation ```bibtex @article{openathena2026approach2, title={Problems with Chinchilla Approach 2: Systematic Biases in IsoFLOP Parabola Fits}, author={Czech, Eric and Xu, Zhiwei and Elmatad, Yael and Wang, Yixin and Held, William}, journal={arXiv preprint arXiv:2603.22339}, year={2026} } ```

提供机构：

open-athena

5,000+

优质数据集

54 个

任务类型

进入经典数据集