five

jeremycochoy/contrastive-training-small-bundles

收藏
Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/jeremycochoy/contrastive-training-small-bundles
下载链接
链接失效反馈
官方服务:
资源简介:
# small_mixed_v1 Pre-shuffled, pre-mixed training bundle for the Small tier (42.5M params) of the contrastive forecasting model family. - **Total rows:** 27282687 - **Number of shards:** 2736 ## Per-source row counts (source_id -> rows) - 0 gift: 20250495 (74.2%) - 1 wiki_hourly: 3715121 (13.6%) - 2 wiki_daily: 1990244 (7.3%) - 3 wiki_stl_residual: 530731 (1.9%) - 4 wiki_stl_seasonal: 371512 (1.4%) - 5 wiki_stl_trend: 159219 (0.6%) - 6 synthetic: 265365 (1.0%) ## Schema | Column | Type | Notes | |---|---|---| | | | Fixed-length window | | | | 0=gift, 1..5=wiki sub-sources, 6=synthetic | | | | Source-specific metadata | ## Shuffling and sampling Globally shuffled via two-pass bucket shuffle (stage 3). Every output shard is a statistically uniform random sample of the entire input. Max mix-ratio delta between first and last shard: 0.009. GIFT per-file sampling is byte-weighted. See the generating repo () for details. NaN rate: 0.06% all-NaN, 0.22% partial-NaN. Consumers should forward-fill partial NaN and skip all-NaN rows. Generated by (PRs #194-#202).
提供机构:
jeremycochoy
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作