jeremycochoy/contrastive-training-small-bundles
收藏Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/jeremycochoy/contrastive-training-small-bundles
下载链接
链接失效反馈官方服务:
资源简介:
# small_mixed_v1
Pre-shuffled, pre-mixed training bundle for the Small tier (42.5M params)
of the contrastive forecasting model family.
- **Total rows:** 27282687
- **Number of shards:** 2736
## Per-source row counts (source_id -> rows)
- 0 gift: 20250495 (74.2%)
- 1 wiki_hourly: 3715121 (13.6%)
- 2 wiki_daily: 1990244 (7.3%)
- 3 wiki_stl_residual: 530731 (1.9%)
- 4 wiki_stl_seasonal: 371512 (1.4%)
- 5 wiki_stl_trend: 159219 (0.6%)
- 6 synthetic: 265365 (1.0%)
## Schema
| Column | Type | Notes |
|---|---|---|
| | | Fixed-length window |
| | | 0=gift, 1..5=wiki sub-sources, 6=synthetic |
| | | Source-specific metadata |
## Shuffling and sampling
Globally shuffled via two-pass bucket shuffle (stage 3). Every output
shard is a statistically uniform random sample of the entire input.
Max mix-ratio delta between first and last shard: 0.009.
GIFT per-file sampling is byte-weighted. See the generating repo
() for details.
NaN rate: 0.06% all-NaN, 0.22% partial-NaN. Consumers should
forward-fill partial NaN and skip all-NaN rows.
Generated by (PRs #194-#202).
提供机构:
jeremycochoy



