dongg18/CODETASK_with_instruction_pool
收藏Hugging Face2026-04-13 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/dongg18/CODETASK_with_instruction_pool
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: BFP
features:
- name: task
dtype: string
- name: split
dtype: string
- name: id
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: outputs
sequence: string
- name: definition
sequence: string
- name: positive_examples
dtype: string
- name: negative_examples
dtype: string
splits:
- name: train
num_bytes: 26304705
num_examples: 46680
- name: validation
num_bytes: 570148
num_examples: 1000
- name: test
num_bytes: 2817473
num_examples: 5000
download_size: 0
dataset_size: 29692326
- config_name: CONCODE
features:
- name: task
dtype: string
- name: split
dtype: string
- name: id
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: outputs
sequence: string
- name: definition
sequence: string
- name: positive_examples
dtype: string
- name: negative_examples
dtype: string
splits:
- name: train
num_bytes: 125765953
num_examples: 100000
- name: validation
num_bytes: 1192155
num_examples: 1000
- name: test
num_bytes: 2493229
num_examples: 2000
download_size: 0
dataset_size: 129451337
- config_name: CoST
features:
- name: task
dtype: string
- name: split
dtype: string
- name: id
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: outputs
sequence: string
- name: definition
sequence: string
- name: positive_examples
dtype: string
- name: negative_examples
dtype: string
splits:
- name: train
num_bytes: 6248110
num_examples: 12645
- name: validation
num_bytes: 133734
num_examples: 272
- name: test
num_bytes: 207208
num_examples: 410
download_size: 0
dataset_size: 6589052
- config_name: CodeSearchNet
features:
- name: task
dtype: string
- name: split
dtype: string
- name: id
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: outputs
sequence: string
- name: definition
sequence: string
- name: positive_examples
dtype: string
- name: negative_examples
dtype: string
splits:
- name: train
num_bytes: 19165860
num_examples: 24927
- name: validation
num_bytes: 771432
num_examples: 1000
- name: test
num_bytes: 1008829
num_examples: 1261
download_size: 8544530
dataset_size: 20946121
- config_name: CodeTrans
features:
- name: task
dtype: string
- name: split
dtype: string
- name: id
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: outputs
sequence: string
- name: definition
sequence: string
- name: positive_examples
dtype: string
- name: negative_examples
dtype: string
splits:
- name: train
num_bytes: 8361661
num_examples: 10300
- name: validation
num_bytes: 430784
num_examples: 500
- name: test
num_bytes: 803860
num_examples: 1000
download_size: 0
dataset_size: 9596305
- config_name: KodCode
features:
- name: task
dtype: string
- name: split
dtype: string
- name: id
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: outputs
sequence: string
- name: definition
sequence: string
- name: positive_examples
dtype: string
- name: negative_examples
dtype: string
splits:
- name: train
num_bytes: 252129151
num_examples: 100000
- name: validation
num_bytes: 2542871
num_examples: 1000
- name: test
num_bytes: 12547950
num_examples: 5000
download_size: 0
dataset_size: 267219972
- config_name: RunBugRun
features:
- name: task
dtype: string
- name: split
dtype: string
- name: id
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: outputs
sequence: string
- name: definition
sequence: string
- name: positive_examples
dtype: string
- name: negative_examples
dtype: string
splits:
- name: train
num_bytes: 8551298
num_examples: 10000
- name: validation
num_bytes: 832153
num_examples: 972
- name: test
num_bytes: 836082
num_examples: 1000
download_size: 0
dataset_size: 10219533
- config_name: TheVault_Csharp
features:
- name: task
dtype: string
- name: split
dtype: string
- name: id
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: outputs
sequence: string
- name: definition
sequence: string
- name: positive_examples
dtype: string
- name: negative_examples
dtype: string
splits:
- name: train
num_bytes: 113796459
num_examples: 100000
- name: validation
num_bytes: 942458
num_examples: 1000
- name: test
num_bytes: 6611232
num_examples: 5000
download_size: 0
dataset_size: 121350149
configs:
- config_name: BFP
data_files:
- split: train
path: BFP/train-*
- split: validation
path: BFP/validation-*
- split: test
path: BFP/test-*
- config_name: CONCODE
data_files:
- split: train
path: CONCODE/train-*
- split: validation
path: CONCODE/validation-*
- split: test
path: CONCODE/test-*
- config_name: CoST
data_files:
- split: train
path: CoST/train-*
- split: validation
path: CoST/validation-*
- split: test
path: CoST/test-*
- config_name: CodeSearchNet
data_files:
- split: train
path: CodeSearchNet/train-*
- split: validation
path: CodeSearchNet/validation-*
- split: test
path: CodeSearchNet/test-*
- config_name: CodeTrans
data_files:
- split: train
path: CodeTrans/train-*
- split: validation
path: CodeTrans/validation-*
- split: test
path: CodeTrans/test-*
- config_name: KodCode
data_files:
- split: train
path: KodCode/train-*
- split: validation
path: KodCode/validation-*
- split: test
path: KodCode/test-*
- config_name: RunBugRun
data_files:
- split: train
path: RunBugRun/train-*
- split: validation
path: RunBugRun/validation-*
- split: test
path: RunBugRun/test-*
- config_name: TheVault_Csharp
data_files:
- split: train
path: TheVault_Csharp/train-*
- split: validation
path: TheVault_Csharp/validation-*
- split: test
path: TheVault_Csharp/test-*
---
# Dataset Card for "CODETASK_with_instruction_pool"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
# “带指令池代码任务”数据集卡片(CODETASK_with_instruction_pool)
[需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
---
## 数据集信息
本数据集包含8个配置项,各配置详情如下:
### 配置名称(config_name):BFP
#### 特征字段(features):
- 任务(task):字符串类型
- 划分集(split):字符串类型
- 样本ID(id):字符串类型
- 输入(input):字符串类型
- 输出(output):字符串类型
- 多输出(outputs):字符串序列类型
- 任务定义(definition):字符串序列类型
- 正例样本(positive_examples):字符串类型
- 负例样本(negative_examples):字符串类型
#### 数据集划分:
- 训练集(train):字节占用量(num_bytes)26304705,样本数量(num_examples)46680
- 验证集(validation):字节占用量(num_bytes)570148,样本数量(num_examples)1000
- 测试集(test):字节占用量(num_bytes)2817473,样本数量(num_examples)5000
#### 下载大小(download_size):0,数据集总大小(dataset_size):29692326
### 配置名称(config_name):CONCODE
#### 特征字段(features):
同BFP配置的特征字段
#### 数据集划分:
- 训练集(train):字节占用量(num_bytes)125765953,样本数量(num_examples)100000
- 验证集(validation):字节占用量(num_bytes)1192155,样本数量(num_examples)1000
- 测试集(test):字节占用量(num_bytes)2493229,样本数量(num_examples)2000
#### 下载大小(download_size):0,数据集总大小(dataset_size):129451337
### 配置名称(config_name):CoST
#### 特征字段(features):
同BFP配置的特征字段
#### 数据集划分:
- 训练集(train):字节占用量(num_bytes)6248110,样本数量(num_examples)12645
- 验证集(validation):字节占用量(num_bytes)133734,样本数量(num_examples)272
- 测试集(test):字节占用量(num_bytes)207208,样本数量(num_examples)410
#### 下载大小(download_size):0,数据集总大小(dataset_size):6589052
### 配置名称(config_name):CodeSearchNet
#### 特征字段(features):
同BFP配置的特征字段
#### 数据集划分:
- 训练集(train):字节占用量(num_bytes)19165860,样本数量(num_examples)24927
- 验证集(validation):字节占用量(num_bytes)771432,样本数量(num_examples)1000
- 测试集(test):字节占用量(num_bytes)1008829,样本数量(num_examples)1261
#### 下载大小(download_size):8544530,数据集总大小(dataset_size):20946121
### 配置名称(config_name):CodeTrans
#### 特征字段(features):
同BFP配置的特征字段
#### 数据集划分:
- 训练集(train):字节占用量(num_bytes)8361661,样本数量(num_examples)10300
- 验证集(validation):字节占用量(num_bytes)430784,样本数量(num_examples)500
- 测试集(test):字节占用量(num_bytes)803860,样本数量(num_examples)1000
#### 下载大小(download_size):0,数据集总大小(dataset_size):9596305
### 配置名称(config_name):KodCode
#### 特征字段(features):
同BFP配置的特征字段
#### 数据集划分:
- 训练集(train):字节占用量(num_bytes)252129151,样本数量(num_examples)100000
- 验证集(validation):字节占用量(num_bytes)2542871,样本数量(num_examples)1000
- 测试集(test):字节占用量(num_bytes)12547950,样本数量(num_examples)5000
#### 下载大小(download_size):0,数据集总大小(dataset_size):267219972
### 配置名称(config_name):RunBugRun
#### 特征字段(features):
同BFP配置的特征字段
#### 数据集划分:
- 训练集(train):字节占用量(num_bytes)8551298,样本数量(num_examples)10000
- 验证集(validation):字节占用量(num_bytes)832153,样本数量(num_examples)972
- 测试集(test):字节占用量(num_bytes)836082,样本数量(num_examples)1000
#### 下载大小(download_size):0,数据集总大小(dataset_size):10219533
### 配置名称(config_name):TheVault_Csharp
#### 特征字段(features):
同BFP配置的特征字段
#### 数据集划分:
- 训练集(train):字节占用量(num_bytes)113796459,样本数量(num_examples)100000
- 验证集(validation):字节占用量(num_bytes)942458,样本数量(num_examples)1000
- 测试集(test):字节占用量(num_bytes)6611232,样本数量(num_examples)5000
#### 下载大小(download_size):0,数据集总大小(dataset_size):121350149
---
## 数据集配置与数据文件路径
各配置对应的数据文件路径如下:
1. 配置BFP:
- 训练集:`BFP/train-*`
- 验证集:`BFP/validation-*`
- 测试集:`BFP/test-*`
2. 配置CONCODE:
- 训练集:`CONCODE/train-*`
- 验证集:`CONCODE/validation-*`
- 测试集:`CONCODE/test-*`
3. 配置CoST:
- 训练集:`CoST/train-*`
- 验证集:`CoST/validation-*`
- 测试集:`CoST/test-*`
4. 配置CodeSearchNet:
- 训练集:`CodeSearchNet/train-*`
- 验证集:`CodeSearchNet/validation-*`
- 测试集:`CodeSearchNet/test-*`
5. 配置CodeTrans:
- 训练集:`CodeTrans/train-*`
- 验证集:`CodeTrans/validation-*`
- 测试集:`CodeTrans/test-*`
6. 配置KodCode:
- 训练集:`KodCode/train-*`
- 验证集:`KodCode/validation-*`
- 测试集:`KodCode/test-*`
7. 配置RunBugRun:
- 训练集:`RunBugRun/train-*`
- 验证集:`RunBugRun/validation-*`
- 测试集:`RunBugRun/test-*`
8. 配置TheVault_Csharp:
- 训练集:`TheVault_Csharp/train-*`
- 验证集:`TheVault_Csharp/validation-*`
- 测试集:`TheVault_Csharp/test-*`
提供机构:
dongg18



