five

deqing/addition_dataset

收藏
Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/deqing/addition_dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: test data_files: - split: test path: test/test-*.parquet - config_name: 1BT data_files: - split: train path: 1BT/train-*.parquet - config_name: 10BT data_files: - split: train path: 10BT/train-*.parquet - config_name: 3MT-3digit data_files: - split: train path: 3MT-3digit/train-*.parquet - split: test path: 3MT-3digit/test-*.parquet --- # Addition Dataset Addition problems in the format `{a} + {b} = {c}`. ## Subsets - **test**: 5K held-out evaluation examples (operands >= 10, i.e. min 2 digits) - **1BT**: ~85M training examples (~1 billion tokens under Llama-3 tokenizer) - **10BT**: ~850M training examples (~10 billion tokens) - **3MT-3digit**: Exhaustive single-token addition: all (a, b) with a, b in [0, 999] and a+b <= 999. 500,500 ordered pairs, ~3M tokens. All of a, b, c are single tokens. Symmetry-safe train/test split (10% test). ## Deduplication - Commutative dedup: if `a + b = c` exists, `b + a = c` is excluded (1BT/10BT) - Test exclusion: both orderings of test-set pairs are excluded from train splits - 3MT-3digit: both orderings always in the same split (no commutative leakage) ## Usage ```python from datasets import load_dataset train = load_dataset("deqing/addition_dataset", "1BT", split="train") test = load_dataset("deqing/addition_dataset", "test", split="test") # Single-token exhaustive (0-999, a+b<=999) train_3d = load_dataset("deqing/addition_dataset", "3MT-3digit", split="train") test_3d = load_dataset("deqing/addition_dataset", "3MT-3digit", split="test") ```
提供机构:
deqing
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作