AISE-TUDelft/multilingual-code-comments-fixed
收藏Hugging Face2026-01-09 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/AISE-TUDelft/multilingual-code-comments-fixed
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: Chinese
features:
- name: file_id
dtype: string
- name: content
dtype: string
- name: repo
dtype: string
- name: path
dtype: string
- name: original_comment
dtype: string
- name: masked_data_Qwen/CodeQwen1.5-7B
dtype: string
- name: predict_Qwen/CodeQwen1.5-7B
dtype: string
- name: predicted_comment_Qwen/CodeQwen1.5-7B
dtype: string
- name: masked_data_bigcode/starcoder2-7b
dtype: string
- name: predict_bigcode/starcoder2-7b
dtype: string
- name: predicted_comment_bigcode/starcoder2-7b
dtype: string
- name: masked_data_ibm-granite/granite-8b-code-base
dtype: string
- name: predict_ibm-granite/granite-8b-code-base
dtype: string
- name: predicted_comment_ibm-granite/granite-8b-code-base
dtype: string
- name: masked_data_meta-llama/CodeLlama-7b-hf
dtype: string
- name: predict_meta-llama/CodeLlama-7b-hf
dtype: string
- name: predicted_comment_meta-llama/CodeLlama-7b-hf
dtype: string
- name: masked_data_google/codegemma-7b
dtype: string
- name: predict_google/codegemma-7b
dtype: string
- name: predicted_comment_google/codegemma-7b
dtype: string
- name: expert_accuracy_Qwen/CodeQwen1.5-7B
dtype: string
- name: error_codes_Qwen/CodeQwen1.5-7B
dtype: string
- name: expert_accuracy_bigcode/starcoder2-7b
dtype: string
- name: error_codes_bigcode/starcoder2-7b
dtype: string
- name: expert_accuracy_google/codegemma-7b
dtype: string
- name: error_codes_google/codegemma-7b
dtype: string
- name: expert_accuracy_ibm-granite/granite-8b-code-base
dtype: string
- name: error_codes_ibm-granite/granite-8b-code-base
dtype: string
- name: expert_accuracy_meta-llama/CodeLlama-7b-hf
dtype: string
- name: error_codes_meta-llama/CodeLlama-7b-hf
dtype: string
splits:
- name: train
num_bytes: 21631801
num_examples: 500
download_size: 8927665
dataset_size: 21631801
- config_name: Dutch
features:
- name: file_id
dtype: string
- name: content
dtype: string
- name: repo
dtype: string
- name: path
dtype: string
- name: original_comment
dtype: string
- name: masked_data_Qwen/CodeQwen1.5-7B
dtype: string
- name: predict_Qwen/CodeQwen1.5-7B
dtype: string
- name: predicted_comment_Qwen/CodeQwen1.5-7B
dtype: string
- name: masked_data_bigcode/starcoder2-7b
dtype: string
- name: expert_accuracy_Qwen/CodeQwen1.5-7B
dtype: string
- name: error_codes_Qwen/CodeQwen1.5-7B
dtype: string
- name: predict_bigcode/starcoder2-7b
dtype: string
- name: predicted_comment_bigcode/starcoder2-7b
dtype: string
- name: masked_data_ibm-granite/granite-8b-code-base
dtype: string
- name: expert_accuracy_bigcode/starcoder2-7b
dtype: string
- name: error_codes_bigcode/starcoder2-7b
dtype: string
- name: predict_ibm-granite/granite-8b-code-base
dtype: string
- name: predicted_comment_ibm-granite/granite-8b-code-base
dtype: string
- name: masked_data_meta-llama/CodeLlama-7b-hf
dtype: string
- name: expert_accuracy_ibm-granite/granite-8b-code-base
dtype: string
- name: error_codes_ibm-granite/granite-8b-code-base
dtype: string
- name: predict_meta-llama/CodeLlama-7b-hf
dtype: string
- name: predicted_comment_meta-llama/CodeLlama-7b-hf
dtype: string
- name: masked_data_google/codegemma-7b
dtype: string
- name: expert_accuracy_meta-llama/CodeLlama-7b-hf
dtype: string
- name: error_codes_meta-llama/CodeLlama-7b-hf
dtype: string
- name: predict_google/codegemma-7b
dtype: string
- name: predicted_comment_google/codegemma-7b
dtype: string
- name: expert_accuracy_google/codegemma-7b
dtype: string
- name: error_codes_google/codegemma-7b
dtype: string
splits:
- name: train
num_bytes: 24073258
num_examples: 500
download_size: 9180742
dataset_size: 24073258
- config_name: English
features:
- name: file_id
dtype: string
- name: content
dtype: string
- name: repo
dtype: string
- name: path
dtype: string
- name: original_comment
dtype: string
- name: masked_data_Qwen/CodeQwen1.5-7B
dtype: string
- name: predict_Qwen/CodeQwen1.5-7B
dtype: string
- name: predicted_comment_Qwen/CodeQwen1.5-7B
dtype: string
- name: masked_data_bigcode/starcoder2-7b
dtype: string
- name: predict_bigcode/starcoder2-7b
dtype: string
- name: predicted_comment_bigcode/starcoder2-7b
dtype: string
- name: masked_data_ibm-granite/granite-8b-code-base
dtype: string
- name: predict_ibm-granite/granite-8b-code-base
dtype: string
- name: predicted_comment_ibm-granite/granite-8b-code-base
dtype: string
- name: masked_data_meta-llama/CodeLlama-7b-hf
dtype: string
- name: predict_meta-llama/CodeLlama-7b-hf
dtype: string
- name: predicted_comment_meta-llama/CodeLlama-7b-hf
dtype: string
- name: masked_data_google/codegemma-7b
dtype: string
- name: predict_google/codegemma-7b
dtype: string
- name: predicted_comment_google/codegemma-7b
dtype: string
- name: error_codes_Qwen/CodeQwen1.5-7B
dtype: string
- name: expert_accuracy_Qwen/CodeQwen1.5-7B
dtype: string
- name: error_codes_bigcode/starcoder2-7b
dtype: string
- name: expert_accuracy_bigcode/starcoder2-7b
dtype: string
- name: error_codes_ibm-granite/granite-8b-code-base
dtype: string
- name: expert_accuracy_ibm-granite/granite-8b-code-base
dtype: string
- name: error_codes_meta-llama/CodeLlama-7b-hf
dtype: string
- name: expert_accuracy_meta-llama/CodeLlama-7b-hf
dtype: string
- name: error_codes_google/codegemma-7b
dtype: string
- name: expert_accuracy_google/codegemma-7b
dtype: string
splits:
- name: train
num_bytes: 20540810
num_examples: 500
download_size: 8130598
dataset_size: 20540810
- config_name: Greek
features:
- name: file_id
dtype: string
- name: content
dtype: string
- name: repo
dtype: string
- name: path
dtype: string
- name: original_comment
dtype: string
- name: masked_data_Qwen/CodeQwen1.5-7B
dtype: string
- name: predict_Qwen/CodeQwen1.5-7B
dtype: string
- name: predicted_comment_Qwen/CodeQwen1.5-7B
dtype: string
- name: masked_data_bigcode/starcoder2-7b
dtype: string
- name: predict_bigcode/starcoder2-7b
dtype: string
- name: predicted_comment_bigcode/starcoder2-7b
dtype: string
- name: masked_data_ibm-granite/granite-8b-code-base
dtype: string
- name: predict_ibm-granite/granite-8b-code-base
dtype: string
- name: predicted_comment_ibm-granite/granite-8b-code-base
dtype: string
- name: masked_data_meta-llama/CodeLlama-7b-hf
dtype: string
- name: predict_meta-llama/CodeLlama-7b-hf
dtype: string
- name: predicted_comment_meta-llama/CodeLlama-7b-hf
dtype: string
- name: masked_data_google/codegemma-7b
dtype: string
- name: predict_google/codegemma-7b
dtype: string
- name: predicted_comment_google/codegemma-7b
dtype: string
- name: error_codes_bigcode/starcoder2-7b
dtype: string
- name: error_codes_ibm-granite/granite-8b-code-base
dtype: string
- name: error_codes_meta-llama/CodeLlama-7b-hf
dtype: string
- name: error_codes_google/codegemma-7b
dtype: string
- name: error_codes_Qwen/CodeQwen1.5-7B
dtype: string
- name: expert_accuracy_bigcode/starcoder2-7b
dtype: string
- name: expert_accuracy_ibm-granite/granite-8b-code-base
dtype: string
- name: expert_accuracy_meta-llama/CodeLlama-7b-hf
dtype: string
- name: expert_accuracy_google/codegemma-7b
dtype: string
- name: expert_accuracy_Qwen/CodeQwen1.5-7B
dtype: string
splits:
- name: train
num_bytes: 25626813
num_examples: 500
download_size: 9167871
dataset_size: 25626813
- config_name: Polish
features:
- name: file_id
dtype: string
- name: repo
dtype: string
- name: path
dtype: string
- name: content
dtype: string
- name: original_comment
dtype: string
- name: masked_data_Qwen/CodeQwen1.5-7B
dtype: string
- name: predict_Qwen/CodeQwen1.5-7B
dtype: string
- name: predicted_comment_Qwen/CodeQwen1.5-7B
dtype: string
- name: masked_data_bigcode/starcoder2-7b
dtype: string
- name: predict_bigcode/starcoder2-7b
dtype: string
- name: predicted_comment_bigcode/starcoder2-7b
dtype: string
- name: masked_data_ibm-granite/granite-8b-code-base
dtype: string
- name: predict_ibm-granite/granite-8b-code-base
dtype: string
- name: predicted_comment_ibm-granite/granite-8b-code-base
dtype: string
- name: masked_data_meta-llama/CodeLlama-7b-hf
dtype: string
- name: predict_meta-llama/CodeLlama-7b-hf
dtype: string
- name: predicted_comment_meta-llama/CodeLlama-7b-hf
dtype: string
- name: masked_data_google/codegemma-7b
dtype: string
- name: predict_google/codegemma-7b
dtype: string
- name: predicted_comment_google/codegemma-7b
dtype: string
- name: error_codes_Qwen/CodeQwen1.5-7B
dtype: string
- name: expert_accuracy_Qwen/CodeQwen1.5-7B
dtype: string
- name: error_codes_bigcode/starcoder2-7b
dtype: string
- name: expert_accuracy_bigcode/starcoder2-7b
dtype: string
- name: error_codes_ibm-granite/granite-8b-code-base
dtype: string
- name: expert_accuracy_ibm-granite/granite-8b-code-base
dtype: string
- name: error_codes_meta-llama/CodeLlama-7b-hf
dtype: string
- name: expert_accuracy_meta-llama/CodeLlama-7b-hf
dtype: string
- name: error_codes_google/codegemma-7b
dtype: string
- name: expert_accuracy_google/codegemma-7b
dtype: string
splits:
- name: train
num_bytes: 17775627
num_examples: 500
download_size: 7233103
dataset_size: 17775627
configs:
- config_name: Chinese
data_files:
- split: train
path: Chinese/train-*
- config_name: Dutch
data_files:
- split: train
path: Dutch/train-*
- config_name: English
data_files:
- split: train
path: English/train-*
- config_name: Greek
data_files:
- split: train
path: Greek/train-*
- config_name: Polish
data_files:
- split: train
path: Polish/train-*
---
数据集信息:
- 配置名称:Chinese(中文)
特征项:
- 名称:file_id,数据类型:字符串
- 名称:content,数据类型:字符串
- 名称:repo,数据类型:字符串
- 名称:path,数据类型:字符串
- 名称:original_comment,数据类型:字符串
- 名称:masked_data_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:predict_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:predicted_comment_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:masked_data_bigcode/starcoder2-7b,数据类型:字符串
- 名称:predict_bigcode/starcoder2-7b,数据类型:字符串
- 名称:predicted_comment_bigcode/starcoder2-7b,数据类型:字符串
- 名称:masked_data_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:predict_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:predicted_comment_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:masked_data_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:predict_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:predicted_comment_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:masked_data_google/codegemma-7b,数据类型:字符串
- 名称:predict_google/codegemma-7b,数据类型:字符串
- 名称:predicted_comment_google/codegemma-7b,数据类型:字符串
- 名称:expert_accuracy_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:error_codes_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:expert_accuracy_bigcode/starcoder2-7b,数据类型:字符串
- 名称:error_codes_bigcode/starcoder2-7b,数据类型:字符串
- 名称:expert_accuracy_google/codegemma-7b,数据类型:字符串
- 名称:error_codes_google/codegemma-7b,数据类型:字符串
- 名称:expert_accuracy_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:error_codes_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:expert_accuracy_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:error_codes_meta-llama/CodeLlama-7b-hf,数据类型:字符串
数据拆分:
- 拆分名称:train,字节数:21631801,样本数量:500
下载体积:8927665,数据集总字节数:21631801
- 配置名称:Dutch(荷兰语)
特征项:
- 名称:file_id,数据类型:字符串
- 名称:content,数据类型:字符串
- 名称:repo,数据类型:字符串
- 名称:path,数据类型:字符串
- 名称:original_comment,数据类型:字符串
- 名称:masked_data_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:predict_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:predicted_comment_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:masked_data_bigcode/starcoder2-7b,数据类型:字符串
- 名称:expert_accuracy_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:error_codes_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:predict_bigcode/starcoder2-7b,数据类型:字符串
- 名称:predicted_comment_bigcode/starcoder2-7b,数据类型:字符串
- 名称:masked_data_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:expert_accuracy_bigcode/starcoder2-7b,数据类型:字符串
- 名称:error_codes_bigcode/starcoder2-7b,数据类型:字符串
- 名称:predict_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:predicted_comment_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:masked_data_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:expert_accuracy_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:error_codes_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:predict_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:predicted_comment_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:masked_data_google/codegemma-7b,数据类型:字符串
- 名称:expert_accuracy_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:error_codes_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:predict_google/codegemma-7b,数据类型:字符串
- 名称:predicted_comment_google/codegemma-7b,数据类型:字符串
- 名称:expert_accuracy_google/codegemma-7b,数据类型:字符串
- 名称:error_codes_google/codegemma-7b,数据类型:字符串
数据拆分:
- 拆分名称:train,字节数:24073258,样本数量:500
下载体积:9180742,数据集总字节数:24073258
- 配置名称:English(英语)
特征项:
- 名称:file_id,数据类型:字符串
- 名称:content,数据类型:字符串
- 名称:repo,数据类型:字符串
- 名称:path,数据类型:字符串
- 名称:original_comment,数据类型:字符串
- 名称:masked_data_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:predict_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:predicted_comment_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:masked_data_bigcode/starcoder2-7b,数据类型:字符串
- 名称:predict_bigcode/starcoder2-7b,数据类型:字符串
- 名称:predicted_comment_bigcode/starcoder2-7b,数据类型:字符串
- 名称:masked_data_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:predict_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:predicted_comment_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:masked_data_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:predict_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:predicted_comment_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:masked_data_google/codegemma-7b,数据类型:字符串
- 名称:predict_google/codegemma-7b,数据类型:字符串
- 名称:predicted_comment_google/codegemma-7b,数据类型:字符串
- 名称:error_codes_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:expert_accuracy_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:error_codes_bigcode/starcoder2-7b,数据类型:字符串
- 名称:expert_accuracy_bigcode/starcoder2-7b,数据类型:字符串
- 名称:error_codes_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:expert_accuracy_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:error_codes_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:expert_accuracy_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:error_codes_google/codegemma-7b,数据类型:字符串
- 名称:expert_accuracy_google/codegemma-7b,数据类型:字符串
数据拆分:
- 拆分名称:train,字节数:20540810,样本数量:500
下载体积:8130598,数据集总字节数:20540810
- 配置名称:Greek(希腊语)
特征项:
- 名称:file_id,数据类型:字符串
- 名称:content,数据类型:字符串
- 名称:repo,数据类型:字符串
- 名称:path,数据类型:字符串
- 名称:original_comment,数据类型:字符串
- 名称:masked_data_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:predict_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:predicted_comment_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:masked_data_bigcode/starcoder2-7b,数据类型:字符串
- 名称:predict_bigcode/starcoder2-7b,数据类型:字符串
- 名称:predicted_comment_bigcode/starcoder2-7b,数据类型:字符串
- 名称:masked_data_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:predict_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:predicted_comment_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:masked_data_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:predict_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:predicted_comment_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:masked_data_google/codegemma-7b,数据类型:字符串
- 名称:predict_google/codegemma-7b,数据类型:字符串
- 名称:predicted_comment_google/codegemma-7b,数据类型:字符串
- 名称:error_codes_bigcode/starcoder2-7b,数据类型:字符串
- 名称:error_codes_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:error_codes_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:error_codes_google/codegemma-7b,数据类型:字符串
- 名称:error_codes_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:expert_accuracy_bigcode/starcoder2-7b,数据类型:字符串
- 名称:expert_accuracy_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:expert_accuracy_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:expert_accuracy_google/codegemma-7b,数据类型:字符串
- 名称:expert_accuracy_Qwen/CodeQwen1.5-7B,数据类型:字符串
数据拆分:
- 拆分名称:train,字节数:25626813,样本数量:500
下载体积:9167871,数据集总字节数:25626813
- 配置名称:Polish(波兰语)
特征项:
- 名称:file_id,数据类型:字符串
- 名称:content,数据类型:字符串
- 名称:repo,数据类型:字符串
- 名称:path,数据类型:字符串
- 名称:original_comment,数据类型:字符串
- 名称:masked_data_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:predict_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:predicted_comment_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:masked_data_bigcode/starcoder2-7b,数据类型:字符串
- 名称:predict_bigcode/starcoder2-7b,数据类型:字符串
- 名称:predicted_comment_bigcode/starcoder2-7b,数据类型:字符串
- 名称:masked_data_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:predict_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:predicted_comment_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:masked_data_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:predict_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:predicted_comment_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:masked_data_google/codegemma-7b,数据类型:字符串
- 名称:predict_google/codegemma-7b,数据类型:字符串
- 名称:predicted_comment_google/codegemma-7b,数据类型:字符串
- 名称:error_codes_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:expert_accuracy_Qwen/CodeQwen1.5-7B,数据类型:字符串
- 名称:error_codes_bigcode/starcoder2-7b,数据类型:字符串
- 名称:expert_accuracy_bigcode/starcoder2-7b,数据类型:字符串
- 名称:error_codes_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:expert_accuracy_ibm-granite/granite-8b-code-base,数据类型:字符串
- 名称:error_codes_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:expert_accuracy_meta-llama/CodeLlama-7b-hf,数据类型:字符串
- 名称:error_codes_google/codegemma-7b,数据类型:字符串
- 名称:expert_accuracy_google/codegemma-7b,数据类型:字符串
数据拆分:
- 拆分名称:train,字节数:17775627,样本数量:500
下载体积:7233103,数据集总字节数:17775627
配置项:
- 配置名称:Chinese,数据文件:
- 拆分:train,路径:Chinese/train-*
- 配置名称:Dutch,数据文件:
- 拆分:train,路径:Dutch/train-*
- 配置名称:English,数据文件:
- 拆分:train,路径:English/train-*
- 配置名称:Greek,数据文件:
- 拆分:train,路径:Greek/train-*
- 配置名称:Polish,数据文件:
- 拆分:train,路径:Polish/train-*
提供机构:
AISE-TUDelft



