stojchet/final_base_dataset
收藏Hugging Face2024-07-16 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/stojchet/final_base_dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含Java和Python两种编程语言的函数代码数据。每种语言的数据集都包含多个特征,如仓库名称、函数路径、函数名称、完整函数字符串、语言、函数代码字符串、函数代码标记、函数文档字符串、函数文档标记等。数据集被分为训练集、验证集和测试集,分别用于模型训练、验证和测试。Java数据集包含41507个训练样本、1000个验证样本和1000个测试样本,总大小为228904318字节。Python数据集包含49689个训练样本、4702个验证样本和3823个测试样本,总大小为363352916字节。
This dataset contains function code data for two programming languages: Java and Python. Each languages dataset includes multiple features such as repository name, function path, function name, whole function string, language, function code string, function code tokens, function documentation string, function documentation tokens, etc. The dataset is divided into training, validation, and test sets for model training, validation, and testing. The Java dataset contains 41507 training samples, 1000 validation samples, and 1000 test samples, with a total size of 228904318 bytes. The Python dataset contains 49689 training samples, 4702 validation samples, and 3823 test samples, with a total size of 363352916 bytes.
提供机构:
stojchet
原始信息汇总
数据集概述
数据集配置
Java 配置
- 特征:
- repository_name: string
- func_path_in_repository: string
- func_name: string
- whole_func_string: string
- language: string
- func_code_string: string
- func_code_tokens: sequence of string
- func_documentation_string: string
- func_documentation_tokens: sequence of string
- split_name: string
- func_code_url: string
- prediction: string
- prepared_prompt: string
- func_def: string
- 分割:
- train:
- num_bytes: 218385329
- num_examples: 41507
- test:
- num_bytes: 5285635
- num_examples: 1000
- validation:
- num_bytes: 5233354
- num_examples: 1000
- train:
- 下载大小:100989615
- 数据集大小:228904318
Python 配置
- 特征:
- repository_name: string
- func_path_in_repository: string
- func_name: string
- whole_func_string: string
- language: string
- func_code_string: string
- func_code_tokens: sequence of string
- func_documentation_string: string
- func_documentation_tokens: sequence of string
- split_name: string
- func_code_url: string
- prediction: string
- prepared_prompt: string
- func_def: string
- 分割:
- train:
- num_bytes: 319893577
- num_examples: 49689
- validation:
- num_bytes: 23838917
- num_examples: 4702
- test:
- num_bytes: 19620422
- num_examples: 3823
- train:
- 下载大小:166281550
- 数据集大小:363352916
数据文件路径
Java 配置
- train: java/train-*
- validation: java/validation-*
- test: java/test-*
Python 配置
- train: python/train-*
- validation: python/validation-*
- test: python/test-*



