open-athena/isoflop-experiments
收藏Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/open-athena/isoflop-experiments
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- other
---
# IsoFLOP Scaling Law Experiments
Curated collection of IsoFLOP curve data from 6 experiments, standardized to a common schema.
This dataset is associated with the paper [Problems with Chinchilla Approach 2: Systematic Biases in IsoFLOP Parabola Fits](https://huggingface.co/papers/2603.22339).
- **Project Page:** [https://openathena.ai/scaling-law-analysis](https://openathena.ai/scaling-law-analysis/)
- **Data Extraction & Prep:** [Open-Athena/scaling-law-analysis](https://github.com/Open-Athena/scaling-law-analysis)
- **Scaling Law Estimation:** [Open-Athena/vpnls](https://github.com/Open-Athena/vpnls)
## Schema
| Field | Type | Description |
|---|---|---|
| `source` | string | Data source identifier. One of: `ml_scalefit`, `epochai_chinchilla`, `llama_3`, `marin_202603`, `misfitting`. |
| `dataset` | string | Training dataset. One of: `massivetext`, `llama_3`, `comma`, `dclm`, `nemotron`, `fineweb_c4`. |
| `model` | string | Model architecture. One of: `chinchilla`, `llama_3`, `llama_2`, `transformer`. |
| `experiment` | string | Canonical identifier. Defined as `source__dataset__model` with deduplication. |
| `tokens` | float | Training tokens (`D`). Either from the source data or derived via `D = C / (6N)`. |
| `params` | float | Model parameter count (`N`). Either from the source data or derived via `N = C / (6D)`. |
| `budget` | float | Compute budget in FLOPs (`C`). |
| `loss` | float | Validation loss. Underlying source varies by experiment. |
Each row is uniquely identified by `(experiment, tokens, params, budget)`.
## Usage
Fit Chinchilla scaling law parameters from this dataset using [vpnls](https://github.com/Open-Athena/vpnls):
```python
from datasets import load_dataset
from vpnls.api import fit_vpnls
N, D, L = (
load_dataset('open-athena/isoflop-experiments', split='train').to_pandas()
.query("experiment == 'ml_scalefit__massivetext__chinchilla'")
.filter(items=['params', 'tokens', 'loss']).values.copy().T
)
result = fit_vpnls(N, D, L)
print(f'α={result.alpha:.4f}, β={result.beta:.4f}, E={result.E:.4f}, A={result.A:.4f}, B={result.B:.4f}')
# α=0.3900, β=0.4300, E=1.9160, A=999.8009, B=7944.6131
```
See [vpnls#usage](https://github.com/Open-Athena/vpnls?tab=readme-ov-file#usage) for more examples.
## Summary
| Experiment | Points | Budgets | Reference | Collection Method |
|---|---|---|---|---|
| `ml_scalefit__massivetext__chinchilla` | 124 | 9 | [arxiv:2507.09404](https://arxiv.org/abs/2507.09404) | GitHub CSV |
| `epochai_chinchilla__massivetext__chinchilla` | 123 | 9 | [arxiv:2404.10102](https://arxiv.org/abs/2404.10102) | SVG digitization |
| `llama_3` | 133 | 10 | [arxiv:2407.21783](https://arxiv.org/abs/2407.21783) | SVG digitization |
| `marin_202603__comma__llama_2` | 85 | 7 | [W&B report](https://wandb.ai/marin-community/marin/reports/Scaling-Ladders--VmlldzoxNTc0MjM1NQ) | W&B export |
| `marin_202603__dclm__llama_2` | 85 | 7 | [W&B report](https://wandb.ai/marin-community/marin/reports/Scaling-Ladders--VmlldzoxNTc0MjM1NQ) | W&B export |
| `marin_202603__nemotron__llama_2` | 88 | 8 | [W&B report](https://wandb.ai/marin-community/marin/reports/Scaling-Ladders--VmlldzoxNTc0MjM1NQ) | W&B export |
| `misfitting__fineweb_c4__transformer` | 176 | 26 | [arxiv:2502.18969](https://arxiv.org/abs/2502.18969) | Checkpoint interpolation |
| **Total** | **814** | | | |
## Experiment Details
### ml_scalefit
Chinchilla training data from Besiroglu et al. ([arxiv:2507.09404](https://arxiv.org/abs/2507.09404)). Raw data: [`apple/ml-scalefit/data/chinchilla.csv`](https://github.com/apple/ml-scalefit/blob/ac4664af5db6c94e6ac7521a61dd3bbb0d91cc3a/data/chinchilla.csv) with columns `model_size` (`N`), `n_tokens` (`D`), `loss`. Budget `C = 6ND` is computed and snapped to the 9 Chinchilla IsoFLOP levels (`6e18` to `3e21`); points >10% from the nearest budget are discarded. `N`, `D`, and loss are kept as-is.
### epochai_chinchilla
Independent extraction of the same Chinchilla experiments by Besiroglu et al. ([arxiv:2404.10102](https://arxiv.org/abs/2404.10102)), digitized from SVG figures in the original paper. Raw data: [`epoch-research/analyzing-chinchilla/data/svg_extracted_data.csv`](https://github.com/epoch-research/analyzing-chinchilla/blob/92258837425e1b5f2851d624287f0120583a3d0e/data/svg_extracted_data.csv) with columns `Model Size` (`N`), `Training FLOP` (`C`), `loss`. `N` and `C` are rounded to integers (SVG artifact). `C` is snapped to the same 9 budgets as ml_scalefit; near-duplicates from SVG extraction are resolved by keeping the point closest to the target budget. `D` is derived as `C / (6N)`.
### llama_3
Digitized from SVG figures in the Llama 3 technical report ([arxiv:2407.21783](https://arxiv.org/abs/2407.21783)). Raw data: [`eric-czech/llama3_isoflop_extraction/isoflops_points.csv`](https://github.com/eric-czech/llama3_isoflop_extraction/blob/1bc1755b76e6ee55a911549c8ec52b71cb480320/isoflops_points.csv) with columns `compute_budget` (`C`), `training_tokens` (`D`), `validation_loss`. `N` is derived as `C / (6D)`.
### misfitting
Scaling law survey data from Marghi et al. ([arxiv:2502.18969](https://arxiv.org/abs/2502.18969)). Transformers trained on FineWeb, evaluated on C4. Raw data: [`hadasah/scaling_laws/data/scaling_results.csv`](https://github.com/hadasah/scaling_laws/blob/1f3708c0a12df0effb0ee906b1da5f9f0ff4f4f1/data/scaling_results.csv) — per-checkpoint training logs. IsoFLOP curves are constructed by: (1) building a grid of 40 log-spaced budget candidates, keeping levels where ≥3 model sizes have data within 10% FLOP tolerance; (2) interpolating each run's loss at target budgets via log-log interpolation over nearby checkpoints; (3) selecting the best learning rate per model size. `D` is derived from the target budget. Follows the interpolation approach in `hadasah/scaling_laws/paper_analysis_and_plots.py`.
### marin_202603
Marin community scaling ladder experiments: Llama 2 models trained on three datasets (Comma, DCLM, Nemotron). Raw data: vendored CSVs exported from the [Marin W&B project](https://wandb.ai/marin-community/marin/reports/Scaling-Ladders--VmlldzoxNTc0MjM1NQ). Budget is parsed from run names and multiplied by 3 to convert from forward-pass FLOPs (`≈2ND`) to total FLOPs (`≈6ND`); this factor was validated empirically across all runs. "Validation-optimal" runs (which use a different FLOPs convention) are excluded. Loss is `eval/paloma/macro_loss`.
## Citation
```bibtex
@article{openathena2026approach2,
title={Problems with Chinchilla Approach 2: Systematic Biases in IsoFLOP Parabola Fits},
author={Czech, Eric and Xu, Zhiwei and Elmatad, Yael and Wang, Yixin and Held, William},
journal={arXiv preprint arXiv:2603.22339},
year={2026}
}
```
提供机构:
open-athena



