stojchet/csn_filtered_subset
收藏Hugging Face2024-06-02 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/stojchet/csn_filtered_subset
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: go
features:
- name: repository_name
dtype: string
- name: func_path_in_repository
dtype: string
- name: func_name
dtype: string
- name: whole_func_string
dtype: string
- name: language
dtype: string
- name: func_code_string
dtype: string
- name: func_code_tokens
sequence: string
- name: func_documentation_string
dtype: string
- name: func_documentation_tokens
sequence: string
- name: split_name
dtype: string
- name: func_code_url
dtype: string
splits:
- name: train
num_bytes: 877763487.7648174
num_examples: 282184
- name: validation
num_bytes: 39170182.18101263
num_examples: 12719
- name: test
num_bytes: 34052630.992171414
num_examples: 11092
download_size: 210439217
dataset_size: 950986300.9380014
- config_name: java
features:
- name: repository_name
dtype: string
- name: func_path_in_repository
dtype: string
- name: func_name
dtype: string
- name: whole_func_string
dtype: string
- name: language
dtype: string
- name: func_code_string
dtype: string
- name: func_code_tokens
sequence: string
- name: func_documentation_string
dtype: string
- name: func_documentation_tokens
sequence: string
- name: split_name
dtype: string
- name: func_code_url
dtype: string
splits:
- name: train
num_bytes: 1131253062.4569702
num_examples: 363676
- name: validation
num_bytes: 38123098.138120554
num_examples: 12379
- name: test
num_bytes: 66732964.28748918
num_examples: 21737
download_size: 356500962
dataset_size: 1236109124.88258
- config_name: javascript
features:
- name: repository_name
dtype: string
- name: func_path_in_repository
dtype: string
- name: func_name
dtype: string
- name: whole_func_string
dtype: string
- name: language
dtype: string
- name: func_code_string
dtype: string
- name: func_code_tokens
sequence: string
- name: func_documentation_string
dtype: string
- name: func_documentation_tokens
sequence: string
- name: split_name
dtype: string
- name: func_code_url
dtype: string
splits:
- name: train
num_bytes: 292440593.8633003
num_examples: 94014
- name: validation
num_bytes: 18410201.201202415
num_examples: 5978
- name: test
num_bytes: 15181235.147519622
num_examples: 4945
download_size: 140053671
dataset_size: 326032030.21202236
- config_name: php
features:
- name: repository_name
dtype: string
- name: func_path_in_repository
dtype: string
- name: func_name
dtype: string
- name: whole_func_string
dtype: string
- name: language
dtype: string
- name: func_code_string
dtype: string
- name: func_code_tokens
sequence: string
- name: func_documentation_string
dtype: string
- name: func_documentation_tokens
sequence: string
- name: split_name
dtype: string
- name: func_code_url
dtype: string
splits:
- name: train
num_bytes: 1188463343.9292386
num_examples: 382068
- name: validation
num_bytes: 56548697.63407138
num_examples: 18362
- name: test
num_bytes: 61572265.34249818
num_examples: 20056
download_size: 390937771
dataset_size: 1306584306.905808
- config_name: python
features:
- name: repository_name
dtype: string
- name: func_path_in_repository
dtype: string
- name: func_name
dtype: string
- name: whole_func_string
dtype: string
- name: language
dtype: string
- name: func_code_string
dtype: string
- name: func_code_tokens
sequence: string
- name: func_documentation_string
dtype: string
- name: func_documentation_tokens
sequence: string
- name: split_name
dtype: string
- name: func_code_url
dtype: string
splits:
- name: train
num_bytes: 1093813798.2225087
num_examples: 351640
- name: validation
num_bytes: 61670170.4673935
num_examples: 20025
- name: test
num_bytes: 57998765.299684666
num_examples: 18892
download_size: 536239180
dataset_size: 1213482733.9895868
- config_name: ruby
features:
- name: repository_name
dtype: string
- name: func_path_in_repository
dtype: string
- name: func_name
dtype: string
- name: whole_func_string
dtype: string
- name: language
dtype: string
- name: func_code_string
dtype: string
- name: func_code_tokens
sequence: string
- name: func_documentation_string
dtype: string
- name: func_documentation_tokens
sequence: string
- name: split_name
dtype: string
- name: func_code_url
dtype: string
splits:
- name: train
num_bytes: 131009429.35882549
num_examples: 42117
- name: validation
num_bytes: 5820555.414900061
num_examples: 1890
- name: test
num_bytes: 5679531.854987118
num_examples: 1850
download_size: 37336246
dataset_size: 142509516.62871265
configs:
- config_name: go
data_files:
- split: train
path: go/train-*
- split: validation
path: go/validation-*
- split: test
path: go/test-*
- config_name: java
data_files:
- split: train
path: java/train-*
- split: validation
path: java/validation-*
- split: test
path: java/test-*
- config_name: javascript
data_files:
- split: train
path: javascript/train-*
- split: validation
path: javascript/validation-*
- split: test
path: javascript/test-*
- config_name: php
data_files:
- split: train
path: php/train-*
- split: validation
path: php/validation-*
- split: test
path: php/test-*
- config_name: python
data_files:
- split: train
path: python/train-*
- split: validation
path: python/validation-*
- split: test
path: python/test-*
- config_name: ruby
data_files:
- split: train
path: ruby/train-*
- split: validation
path: ruby/validation-*
- split: test
path: ruby/test-*
---
提供机构:
stojchet
原始信息汇总
数据集概述
数据集配置及特征
-
Go 语言数据集
- 特征:
repository_name: 字符串func_path_in_repository: 字符串func_name: 字符串whole_func_string: 字符串language: 字符串func_code_string: 字符串func_code_tokens: 序列,字符串func_documentation_string: 字符串func_documentation_tokens: 序列,字符串split_name: 字符串func_code_url: 字符串
- 分割:
- 训练集: 282184 个示例,877763487.7648174 字节
- 验证集: 12719 个示例,39170182.18101263 字节
- 测试集: 11092 个示例,34052630.992171414 字节
- 下载大小: 210439217 字节
- 数据集大小: 950986300.9380014 字节
- 特征:
-
Java 语言数据集
- 特征: 同上
- 分割:
- 训练集: 363676 个示例,1131253062.4569702 字节
- 验证集: 12379 个示例,38123098.138120554 字节
- 测试集: 21737 个示例,66732964.28748918 字节
- 下载大小: 356500962 字节
- 数据集大小: 1236109124.88258 字节
-
JavaScript 语言数据集
- 特征: 同上
- 分割:
- 训练集: 94014 个示例,292440593.8633003 字节
- 验证集: 5978 个示例,18410201.201202415 字节
- 测试集: 4945 个示例,15181235.147519622 字节
- 下载大小: 140053671 字节
- 数据集大小: 326032030.21202236 字节
-
PHP 语言数据集
- 特征: 同上
- 分割:
- 训练集: 382068 个示例,1188463343.9292386 字节
- 验证集: 18362 个示例,56548697.63407138 字节
- 测试集: 20056 个示例,61572265.34249818 字节
- 下载大小: 390937771 字节
- 数据集大小: 1306584306.905808 字节
-
Python 语言数据集
- 特征: 同上
- 分割:
- 训练集: 351640 个示例,1093813798.2225087 字节
- 验证集: 20025 个示例,61670170.4673935 字节
- 测试集: 18892 个示例,57998765.299684666 字节
- 下载大小: 536239180 字节
- 数据集大小: 1213482733.9895868 字节
-
Ruby 语言数据集
- 特征: 同上
- 分割:
- 训练集: 42117 个示例,131009429.35882549 字节
- 验证集: 1890 个示例,5820555.414900061 字节
- 测试集: 1850 个示例,5679531.854987118 字节
- 下载大小: 37336246 字节
- 数据集大小: 142509516.62871265 字节
数据文件路径
-
Go:
- 训练集:
go/train-* - 验证集:
go/validation-* - 测试集:
go/test-*
- 训练集:
-
Java:
- 训练集:
java/train-* - 验证集:
java/validation-* - 测试集:
java/test-*
- 训练集:
-
JavaScript:
- 训练集:
javascript/train-* - 验证集:
javascript/validation-* - 测试集:
javascript/test-*
- 训练集:
-
PHP:
- 训练集:
php/train-* - 验证集:
php/validation-* - 测试集:
php/test-*
- 训练集:
-
Python:
- 训练集:
python/train-* - 验证集:
python/validation-* - 测试集:
python/test-*
- 训练集:
-
Ruby:
- 训练集:
ruby/train-* - 验证集:
ruby/validation-* - 测试集:
ruby/test-*
- 训练集:



