aalexchengg/codesearchnet_qa
收藏Hugging Face2023-12-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/aalexchengg/codesearchnet_qa
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
- split: validation
path: data/validation-*
dataset_info:
features:
- name: repository_name
dtype: string
- name: func_path_in_repository
dtype: string
- name: func_name
dtype: string
- name: whole_func_string
dtype: string
- name: language
dtype: string
- name: func_code_string
dtype: string
- name: func_code_tokens
sequence: string
- name: func_documentation_string
dtype: string
- name: func_documentation_tokens
sequence: string
- name: split_name
dtype: string
- name: func_code_url
dtype: string
- name: parameters
sequence: string
- name: question
dtype: string
- name: answer
sequence: string
splits:
- name: train
num_bytes: 10502480533
num_examples: 3117455
- name: test
num_bytes: 578050783
num_examples: 168654
- name: validation
num_bytes: 447416570
num_examples: 138482
download_size: 3082853329
dataset_size: 11527947886
---
This dataset is designed for analyzing and processing programming function information. It includes multiple features such as repository name, function path, function name, function code, documentation, language, etc., and is divided into train, test, and validation parts. Each part has detailed data size and example count.
提供机构:
aalexchengg
原始信息汇总
数据集概述
许可证
- Apache 2.0
配置
- 默认配置
- 数据文件路径:
- 训练集:
data/train-* - 测试集:
data/test-* - 验证集:
data/validation-*
- 训练集:
- 数据文件路径:
数据集信息
特征
repository_name: 字符串func_path_in_repository: 字符串func_name: 字符串whole_func_string: 字符串language: 字符串func_code_string: 字符串func_code_tokens: 字符串序列func_documentation_string: 字符串func_documentation_tokens: 字符串序列split_name: 字符串func_code_url: 字符串parameters: 字符串序列question: 字符串answer: 字符串序列
数据分割
- 训练集
- 字节数:10502480533
- 样本数:3117455
- 测试集
- 字节数:578050783
- 样本数:168654
- 验证集
- 字节数:447416570
- 样本数:138482
数据集大小
- 下载大小:3082853329 字节
- 数据集大小:11527947886 字节



