five

dongg18/CODETASK_with_instruction_pool

收藏
Hugging Face2026-04-13 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/dongg18/CODETASK_with_instruction_pool
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: BFP features: - name: task dtype: string - name: split dtype: string - name: id dtype: string - name: input dtype: string - name: output dtype: string - name: outputs sequence: string - name: definition sequence: string - name: positive_examples dtype: string - name: negative_examples dtype: string splits: - name: train num_bytes: 26304705 num_examples: 46680 - name: validation num_bytes: 570148 num_examples: 1000 - name: test num_bytes: 2817473 num_examples: 5000 download_size: 0 dataset_size: 29692326 - config_name: CONCODE features: - name: task dtype: string - name: split dtype: string - name: id dtype: string - name: input dtype: string - name: output dtype: string - name: outputs sequence: string - name: definition sequence: string - name: positive_examples dtype: string - name: negative_examples dtype: string splits: - name: train num_bytes: 125765953 num_examples: 100000 - name: validation num_bytes: 1192155 num_examples: 1000 - name: test num_bytes: 2493229 num_examples: 2000 download_size: 0 dataset_size: 129451337 - config_name: CoST features: - name: task dtype: string - name: split dtype: string - name: id dtype: string - name: input dtype: string - name: output dtype: string - name: outputs sequence: string - name: definition sequence: string - name: positive_examples dtype: string - name: negative_examples dtype: string splits: - name: train num_bytes: 6248110 num_examples: 12645 - name: validation num_bytes: 133734 num_examples: 272 - name: test num_bytes: 207208 num_examples: 410 download_size: 0 dataset_size: 6589052 - config_name: CodeSearchNet features: - name: task dtype: string - name: split dtype: string - name: id dtype: string - name: input dtype: string - name: output dtype: string - name: outputs sequence: string - name: definition sequence: string - name: positive_examples dtype: string - name: negative_examples dtype: string splits: - name: train num_bytes: 19165860 num_examples: 24927 - name: validation num_bytes: 771432 num_examples: 1000 - name: test num_bytes: 1008829 num_examples: 1261 download_size: 8544530 dataset_size: 20946121 - config_name: CodeTrans features: - name: task dtype: string - name: split dtype: string - name: id dtype: string - name: input dtype: string - name: output dtype: string - name: outputs sequence: string - name: definition sequence: string - name: positive_examples dtype: string - name: negative_examples dtype: string splits: - name: train num_bytes: 8361661 num_examples: 10300 - name: validation num_bytes: 430784 num_examples: 500 - name: test num_bytes: 803860 num_examples: 1000 download_size: 0 dataset_size: 9596305 - config_name: KodCode features: - name: task dtype: string - name: split dtype: string - name: id dtype: string - name: input dtype: string - name: output dtype: string - name: outputs sequence: string - name: definition sequence: string - name: positive_examples dtype: string - name: negative_examples dtype: string splits: - name: train num_bytes: 252129151 num_examples: 100000 - name: validation num_bytes: 2542871 num_examples: 1000 - name: test num_bytes: 12547950 num_examples: 5000 download_size: 0 dataset_size: 267219972 - config_name: RunBugRun features: - name: task dtype: string - name: split dtype: string - name: id dtype: string - name: input dtype: string - name: output dtype: string - name: outputs sequence: string - name: definition sequence: string - name: positive_examples dtype: string - name: negative_examples dtype: string splits: - name: train num_bytes: 8551298 num_examples: 10000 - name: validation num_bytes: 832153 num_examples: 972 - name: test num_bytes: 836082 num_examples: 1000 download_size: 0 dataset_size: 10219533 - config_name: TheVault_Csharp features: - name: task dtype: string - name: split dtype: string - name: id dtype: string - name: input dtype: string - name: output dtype: string - name: outputs sequence: string - name: definition sequence: string - name: positive_examples dtype: string - name: negative_examples dtype: string splits: - name: train num_bytes: 113796459 num_examples: 100000 - name: validation num_bytes: 942458 num_examples: 1000 - name: test num_bytes: 6611232 num_examples: 5000 download_size: 0 dataset_size: 121350149 configs: - config_name: BFP data_files: - split: train path: BFP/train-* - split: validation path: BFP/validation-* - split: test path: BFP/test-* - config_name: CONCODE data_files: - split: train path: CONCODE/train-* - split: validation path: CONCODE/validation-* - split: test path: CONCODE/test-* - config_name: CoST data_files: - split: train path: CoST/train-* - split: validation path: CoST/validation-* - split: test path: CoST/test-* - config_name: CodeSearchNet data_files: - split: train path: CodeSearchNet/train-* - split: validation path: CodeSearchNet/validation-* - split: test path: CodeSearchNet/test-* - config_name: CodeTrans data_files: - split: train path: CodeTrans/train-* - split: validation path: CodeTrans/validation-* - split: test path: CodeTrans/test-* - config_name: KodCode data_files: - split: train path: KodCode/train-* - split: validation path: KodCode/validation-* - split: test path: KodCode/test-* - config_name: RunBugRun data_files: - split: train path: RunBugRun/train-* - split: validation path: RunBugRun/validation-* - split: test path: RunBugRun/test-* - config_name: TheVault_Csharp data_files: - split: train path: TheVault_Csharp/train-* - split: validation path: TheVault_Csharp/validation-* - split: test path: TheVault_Csharp/test-* --- # Dataset Card for "CODETASK_with_instruction_pool" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

# “带指令池代码任务”数据集卡片(CODETASK_with_instruction_pool) [需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) --- ## 数据集信息 本数据集包含8个配置项,各配置详情如下: ### 配置名称(config_name):BFP #### 特征字段(features): - 任务(task):字符串类型 - 划分集(split):字符串类型 - 样本ID(id):字符串类型 - 输入(input):字符串类型 - 输出(output):字符串类型 - 多输出(outputs):字符串序列类型 - 任务定义(definition):字符串序列类型 - 正例样本(positive_examples):字符串类型 - 负例样本(negative_examples):字符串类型 #### 数据集划分: - 训练集(train):字节占用量(num_bytes)26304705,样本数量(num_examples)46680 - 验证集(validation):字节占用量(num_bytes)570148,样本数量(num_examples)1000 - 测试集(test):字节占用量(num_bytes)2817473,样本数量(num_examples)5000 #### 下载大小(download_size):0,数据集总大小(dataset_size):29692326 ### 配置名称(config_name):CONCODE #### 特征字段(features): 同BFP配置的特征字段 #### 数据集划分: - 训练集(train):字节占用量(num_bytes)125765953,样本数量(num_examples)100000 - 验证集(validation):字节占用量(num_bytes)1192155,样本数量(num_examples)1000 - 测试集(test):字节占用量(num_bytes)2493229,样本数量(num_examples)2000 #### 下载大小(download_size):0,数据集总大小(dataset_size):129451337 ### 配置名称(config_name):CoST #### 特征字段(features): 同BFP配置的特征字段 #### 数据集划分: - 训练集(train):字节占用量(num_bytes)6248110,样本数量(num_examples)12645 - 验证集(validation):字节占用量(num_bytes)133734,样本数量(num_examples)272 - 测试集(test):字节占用量(num_bytes)207208,样本数量(num_examples)410 #### 下载大小(download_size):0,数据集总大小(dataset_size):6589052 ### 配置名称(config_name):CodeSearchNet #### 特征字段(features): 同BFP配置的特征字段 #### 数据集划分: - 训练集(train):字节占用量(num_bytes)19165860,样本数量(num_examples)24927 - 验证集(validation):字节占用量(num_bytes)771432,样本数量(num_examples)1000 - 测试集(test):字节占用量(num_bytes)1008829,样本数量(num_examples)1261 #### 下载大小(download_size):8544530,数据集总大小(dataset_size):20946121 ### 配置名称(config_name):CodeTrans #### 特征字段(features): 同BFP配置的特征字段 #### 数据集划分: - 训练集(train):字节占用量(num_bytes)8361661,样本数量(num_examples)10300 - 验证集(validation):字节占用量(num_bytes)430784,样本数量(num_examples)500 - 测试集(test):字节占用量(num_bytes)803860,样本数量(num_examples)1000 #### 下载大小(download_size):0,数据集总大小(dataset_size):9596305 ### 配置名称(config_name):KodCode #### 特征字段(features): 同BFP配置的特征字段 #### 数据集划分: - 训练集(train):字节占用量(num_bytes)252129151,样本数量(num_examples)100000 - 验证集(validation):字节占用量(num_bytes)2542871,样本数量(num_examples)1000 - 测试集(test):字节占用量(num_bytes)12547950,样本数量(num_examples)5000 #### 下载大小(download_size):0,数据集总大小(dataset_size):267219972 ### 配置名称(config_name):RunBugRun #### 特征字段(features): 同BFP配置的特征字段 #### 数据集划分: - 训练集(train):字节占用量(num_bytes)8551298,样本数量(num_examples)10000 - 验证集(validation):字节占用量(num_bytes)832153,样本数量(num_examples)972 - 测试集(test):字节占用量(num_bytes)836082,样本数量(num_examples)1000 #### 下载大小(download_size):0,数据集总大小(dataset_size):10219533 ### 配置名称(config_name):TheVault_Csharp #### 特征字段(features): 同BFP配置的特征字段 #### 数据集划分: - 训练集(train):字节占用量(num_bytes)113796459,样本数量(num_examples)100000 - 验证集(validation):字节占用量(num_bytes)942458,样本数量(num_examples)1000 - 测试集(test):字节占用量(num_bytes)6611232,样本数量(num_examples)5000 #### 下载大小(download_size):0,数据集总大小(dataset_size):121350149 --- ## 数据集配置与数据文件路径 各配置对应的数据文件路径如下: 1. 配置BFP: - 训练集:`BFP/train-*` - 验证集:`BFP/validation-*` - 测试集:`BFP/test-*` 2. 配置CONCODE: - 训练集:`CONCODE/train-*` - 验证集:`CONCODE/validation-*` - 测试集:`CONCODE/test-*` 3. 配置CoST: - 训练集:`CoST/train-*` - 验证集:`CoST/validation-*` - 测试集:`CoST/test-*` 4. 配置CodeSearchNet: - 训练集:`CodeSearchNet/train-*` - 验证集:`CodeSearchNet/validation-*` - 测试集:`CodeSearchNet/test-*` 5. 配置CodeTrans: - 训练集:`CodeTrans/train-*` - 验证集:`CodeTrans/validation-*` - 测试集:`CodeTrans/test-*` 6. 配置KodCode: - 训练集:`KodCode/train-*` - 验证集:`KodCode/validation-*` - 测试集:`KodCode/test-*` 7. 配置RunBugRun: - 训练集:`RunBugRun/train-*` - 验证集:`RunBugRun/validation-*` - 测试集:`RunBugRun/test-*` 8. 配置TheVault_Csharp: - 训练集:`TheVault_Csharp/train-*` - 验证集:`TheVault_Csharp/validation-*` - 测试集:`TheVault_Csharp/test-*`
提供机构:
dongg18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作