arithmetic-circuit-overloading/synthetic-dataset-v2-2d-5M-500K-0.1-reverse
收藏Hugging Face2026-04-03 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/arithmetic-circuit-overloading/synthetic-dataset-v2-2d-5M-500K-0.1-reverse
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: '100'
features:
- name: _id
dtype: string
- name: base_operation
dtype: string
- name: target_operation
dtype: string
- name: fs_examples
list: string
- name: question
dtype: string
- name: answer
dtype: string
- name: prompt
dtype: string
splits:
- name: train
num_bytes: 1141569089
num_examples: 5000000
- name: validation
num_bytes: 112657742
num_examples: 500000
download_size: 526024425
dataset_size: 1254226831
- config_name: '50'
features:
- name: _id
dtype: string
- name: base_operation
dtype: string
- name: target_operation
dtype: string
- name: fs_examples
list: string
- name: question
dtype: string
- name: answer
dtype: string
- name: prompt
dtype: string
splits:
- name: train
num_bytes: 1141804468
num_examples: 5000000
- name: validation
num_bytes: 112680624
num_examples: 500000
download_size: 815326812
dataset_size: 1254485092
- config_name: '75'
features:
- name: _id
dtype: string
- name: base_operation
dtype: string
- name: target_operation
dtype: string
- name: fs_examples
list: string
- name: question
dtype: string
- name: answer
dtype: string
- name: prompt
dtype: string
splits:
- name: train
num_bytes: 1141686630
num_examples: 5000000
- name: validation
num_bytes: 112679577
num_examples: 500000
download_size: 801101053
dataset_size: 1254366207
- config_name: '90'
features:
- name: _id
dtype: string
- name: base_operation
dtype: string
- name: target_operation
dtype: string
- name: fs_examples
list: string
- name: question
dtype: string
- name: answer
dtype: string
- name: prompt
dtype: string
splits:
- name: train
num_bytes: 1141621304
num_examples: 5000000
- name: validation
num_bytes: 112665660
num_examples: 500000
download_size: 776956774
dataset_size: 1254286964
- config_name: '95'
features:
- name: _id
dtype: string
- name: base_operation
dtype: string
- name: target_operation
dtype: string
- name: fs_examples
list: string
- name: question
dtype: string
- name: answer
dtype: string
- name: prompt
dtype: string
splits:
- name: train
num_bytes: 1141598685
num_examples: 5000000
- name: validation
num_bytes: 112660196
num_examples: 500000
download_size: 746159317
dataset_size: 1254258881
- config_name: '99'
features:
- name: _id
dtype: string
- name: base_operation
dtype: string
- name: target_operation
dtype: string
- name: fs_examples
list: string
- name: question
dtype: string
- name: answer
dtype: string
- name: prompt
dtype: string
splits:
- name: train
num_bytes: 1141575004
num_examples: 5000000
- name: validation
num_bytes: 112658227
num_examples: 500000
download_size: 537604053
dataset_size: 1254233231
configs:
- config_name: '100'
data_files:
- split: train
path: 100/train-*
- split: validation
path: 100/validation-*
- config_name: '50'
data_files:
- split: train
path: 50/train-*
- split: validation
path: 50/validation-*
- config_name: '75'
data_files:
- split: train
path: 75/train-*
- split: validation
path: 75/validation-*
- config_name: '90'
data_files:
- split: train
path: 90/train-*
- split: validation
path: 90/validation-*
- config_name: '95'
data_files:
- split: train
path: 95/train-*
- split: validation
path: 95/validation-*
- config_name: '99'
data_files:
- split: train
path: 99/train-*
- split: validation
path: 99/validation-*
---
数据集信息:
本数据集共包含6个配置版本,配置名称分别为`100`、`50`、`75`、`90`、`95`与`99`。
### 单配置通用特征
每个配置均包含以下7个特征字段:
1. `_id`:字符串(string)类型唯一标识符
2. `base_operation`:字符串类型,基础操作字段
3. `target_operation`:字符串类型,目标操作字段
4. `fs_examples`:字符串列表(list[string])类型,少样本示例字段
5. `question`:字符串类型,问题字段
6. `answer`:字符串类型,答案字段
7. `prompt`:字符串类型,提示词字段
### 数据集划分与统计参数
每个配置均划分出训练集与验证集两个子集,各配置的具体统计参数如下:
1. 配置`100`:
- 训练集:字节占用量1141569089,样本量5000000
- 验证集:字节占用量112657742,样本量500000
- 下载总大小:526024425
- 数据集总大小:1254226831
2. 配置`50`:
- 训练集:字节占用量1141804468,样本量5000000
- 验证集:字节占用量112680624,样本量500000
- 下载总大小:815326812
- 数据集总大小:1254485092
3. 配置`75`:
- 训练集:字节占用量1141686630,样本量5000000
- 验证集:字节占用量112679577,样本量500000
- 下载总大小:801101053
- 数据集总大小:1254366207
4. 配置`90`:
- 训练集:字节占用量1141621304,样本量5000000
- 验证集:字节占用量112665660,样本量500000
- 下载总大小:776956774
- 数据集总大小:1254286964
5. 配置`95`:
- 训练集:字节占用量1141598685,样本量5000000
- 验证集:字节占用量112660196,样本量500000
- 下载总大小:746159317
- 数据集总大小:1254258881
6. 配置`99`:
- 训练集:字节占用量1141575004,样本量5000000
- 验证集:字节占用量112658227,样本量500000
- 下载总大小:537604053
- 数据集总大小:1254233231
### 数据文件路径配置
各配置对应的数据文件路径如下:
1. 配置`100`:训练集数据路径为`100/train-*`,验证集数据路径为`100/validation-*`
2. 配置`50`:训练集数据路径为`50/train-*`,验证集数据路径为`50/validation-*`
3. 配置`75`:训练集数据路径为`75/train-*`,验证集数据路径为`75/validation-*`
4. 配置`90`:训练集数据路径为`90/train-*`,验证集数据路径为`90/validation-*`
5. 配置`95`:训练集数据路径为`95/train-*`,验证集数据路径为`95/validation-*`
6. 配置`99`:训练集数据路径为`99/train-*`,验证集数据路径为`99/validation-*`
提供机构:
arithmetic-circuit-overloading



