JetBrains-Research/template-generation
收藏Hugging Face2024-09-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/JetBrains-Research/template-generation
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个用于AI代理评估的项目模板生成基准数据集。它包含了从GitHub收集的真实项目模板,提供了项目描述、README.md文件内容、项目链接以及GitHub仓库的额外数据和指标。数据集经过精心筛选、增强和手动标注,确保了数据质量,并提供了多个类别(如Python、Java、Kotlin、Android)和分割(如dev、test、train)。数据集的主要用途是评估项目模板生成方法,并提供了加载数据集的代码示例。
该数据集是一个用于AI代理评估的项目模板生成基准数据集。它包含了从GitHub收集的真实项目模板,提供了项目描述、README.md文件内容、项目链接以及GitHub仓库的额外数据和指标。数据集经过精心筛选、增强和手动标注,确保了数据质量,并提供了多个类别(如Python、Java、Kotlin、Android)和分割(如dev、test、train)。数据集的主要用途是评估项目模板生成方法,并提供了加载数据集的代码示例。
提供机构:
JetBrains-Research
原始信息汇总
数据集概述
数据集配置
Android
- 特征:
id: int64full_name: stringowner: stringname: stringhtml_url: stringis_template: booldescription: stringtemplate_keywords: stringlicense: stringtopics: stringsize: int64metrics: stringlanguages: stringlanguage: stringcreated_at: stringupdated_at: stringcode_lines: stringgpt_description: stringrepo_symbols_count: int64repo_tokens_count: int64repo_words_count: int64repo_lines_count: int64repo_files_count: int64repo_code_symbols_count: int64repo_code_tokens_count: int64repo_code_words_count: int64repo_code_lines_count: int64repo_code_files_count: int64description_symbols_count: int64description_tokens_count: int64description_words_count: int64description_lines_count: int64readme: stringreadme_symbols_count: int64readme_tokens_count: int64readme_words_count: int64readme_lines_count: int64
- 分割:
dev: 78114 bytes, 17 examplestest: 4594.941176470588 bytes, 1 exampletrain: 73519.05882352941 bytes, 16 examples
- 下载大小: 177547 bytes
- 数据集大小: 156228.0 bytes
Java
- 特征:
id: int64full_name: stringowner: stringname: stringhtml_url: stringis_template: booldescription: stringtemplate_keywords: stringlicense: stringtopics: stringsize: int64metrics: stringlanguages: stringlanguage: stringcreated_at: stringupdated_at: stringcode_lines: stringgpt_description: stringrepo_symbols_count: int64repo_tokens_count: int64repo_words_count: int64repo_lines_count: int64repo_files_count: int64repo_code_symbols_count: int64repo_code_tokens_count: int64repo_code_words_count: int64repo_code_lines_count: int64repo_code_files_count: int64description_symbols_count: int64description_tokens_count: int64description_words_count: int64description_lines_count: int64readme: stringreadme_symbols_count: int64readme_tokens_count: int64readme_words_count: int64readme_lines_count: int64
- 分割:
dev: 357161 bytes, 81 examplestest: 44093.950617283954 bytes, 10 examplestrain: 313067.04938271607 bytes, 71 examples
- 下载大小: 418713 bytes
- 数据集大小: 714322.0 bytes
Kotlin (kt)
- 特征:
id: int64full_name: stringowner: stringname: stringhtml_url: stringis_template: booldescription: stringtemplate_keywords: stringlicense: stringtopics: stringsize: int64metrics: stringlanguages: stringlanguage: stringcreated_at: stringupdated_at: stringcode_lines: stringgpt_description: stringrepo_symbols_count: int64repo_tokens_count: int64repo_words_count: int64repo_lines_count: int64repo_files_count: int64repo_code_symbols_count: int64repo_code_tokens_count: int64repo_code_words_count: int64repo_code_lines_count: int64repo_code_files_count: int64description_symbols_count: int64description_tokens_count: int64description_words_count: int64description_lines_count: int64readme: stringreadme_symbols_count: int64readme_tokens_count: int64readme_words_count: int64readme_lines_count: int64
- 分割:
dev: 88507 bytes, 19 examplestest: 46582.63157894737 bytes, 10 examplestrain: 41924.36842105263 bytes, 9 examples
- 下载大小: 196218 bytes
- 数据集大小: 177014.0 bytes
Python (py)
- 特征:
id: int64full_name: stringowner: stringname: stringhtml_url: stringis_template: booldescription: stringtemplate_keywords: stringlicense: stringtopics: stringsize: int64metrics: stringlanguages: stringlanguage: stringcreated_at: stringupdated_at: stringcode_lines: stringgpt_description: stringrepo_symbols_count: int64repo_tokens_count: int64repo_words_count: int64repo_lines_count: int64repo_files_count: int64repo_code_symbols_count: int64repo_code_tokens_count: int64repo_code_words_count: int64repo_code_lines_count: int64repo_code_files_count: int64description_symbols_count: int64description_tokens_count: int64description_words_count: int64description_lines_count: int64readme: stringreadme_symbols_count: int64readme_tokens_count: int64readme_words_count: int64readme_lines_count: int64
- 分割:
dev: 2849185 bytes, 565 examplestest: 50428.05309734513 bytes, 10 examplestrain: 2798756.946902655 bytes, 555 examples
- 下载大小: 2868238 bytes
- 数据集大小: 5698370.0 bytes
数据文件路径
Android
dev: android/dev-*test: android/test-*train: android/train-*
Java
dev: java/dev-*test: java/test-*train: java/train-*
Kotlin (kt)
dev: kt/dev-*test: kt/test-*train: kt/train-*
Python (py)
dev: py/dev-*test: py/test-*train: py/train-*



