five

AISE-TUDelft/multilingual-code-comments-fixed

收藏
Hugging Face2026-01-09 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/AISE-TUDelft/multilingual-code-comments-fixed
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: Chinese features: - name: file_id dtype: string - name: content dtype: string - name: repo dtype: string - name: path dtype: string - name: original_comment dtype: string - name: masked_data_Qwen/CodeQwen1.5-7B dtype: string - name: predict_Qwen/CodeQwen1.5-7B dtype: string - name: predicted_comment_Qwen/CodeQwen1.5-7B dtype: string - name: masked_data_bigcode/starcoder2-7b dtype: string - name: predict_bigcode/starcoder2-7b dtype: string - name: predicted_comment_bigcode/starcoder2-7b dtype: string - name: masked_data_ibm-granite/granite-8b-code-base dtype: string - name: predict_ibm-granite/granite-8b-code-base dtype: string - name: predicted_comment_ibm-granite/granite-8b-code-base dtype: string - name: masked_data_meta-llama/CodeLlama-7b-hf dtype: string - name: predict_meta-llama/CodeLlama-7b-hf dtype: string - name: predicted_comment_meta-llama/CodeLlama-7b-hf dtype: string - name: masked_data_google/codegemma-7b dtype: string - name: predict_google/codegemma-7b dtype: string - name: predicted_comment_google/codegemma-7b dtype: string - name: expert_accuracy_Qwen/CodeQwen1.5-7B dtype: string - name: error_codes_Qwen/CodeQwen1.5-7B dtype: string - name: expert_accuracy_bigcode/starcoder2-7b dtype: string - name: error_codes_bigcode/starcoder2-7b dtype: string - name: expert_accuracy_google/codegemma-7b dtype: string - name: error_codes_google/codegemma-7b dtype: string - name: expert_accuracy_ibm-granite/granite-8b-code-base dtype: string - name: error_codes_ibm-granite/granite-8b-code-base dtype: string - name: expert_accuracy_meta-llama/CodeLlama-7b-hf dtype: string - name: error_codes_meta-llama/CodeLlama-7b-hf dtype: string splits: - name: train num_bytes: 21631801 num_examples: 500 download_size: 8927665 dataset_size: 21631801 - config_name: Dutch features: - name: file_id dtype: string - name: content dtype: string - name: repo dtype: string - name: path dtype: string - name: original_comment dtype: string - name: masked_data_Qwen/CodeQwen1.5-7B dtype: string - name: predict_Qwen/CodeQwen1.5-7B dtype: string - name: predicted_comment_Qwen/CodeQwen1.5-7B dtype: string - name: masked_data_bigcode/starcoder2-7b dtype: string - name: expert_accuracy_Qwen/CodeQwen1.5-7B dtype: string - name: error_codes_Qwen/CodeQwen1.5-7B dtype: string - name: predict_bigcode/starcoder2-7b dtype: string - name: predicted_comment_bigcode/starcoder2-7b dtype: string - name: masked_data_ibm-granite/granite-8b-code-base dtype: string - name: expert_accuracy_bigcode/starcoder2-7b dtype: string - name: error_codes_bigcode/starcoder2-7b dtype: string - name: predict_ibm-granite/granite-8b-code-base dtype: string - name: predicted_comment_ibm-granite/granite-8b-code-base dtype: string - name: masked_data_meta-llama/CodeLlama-7b-hf dtype: string - name: expert_accuracy_ibm-granite/granite-8b-code-base dtype: string - name: error_codes_ibm-granite/granite-8b-code-base dtype: string - name: predict_meta-llama/CodeLlama-7b-hf dtype: string - name: predicted_comment_meta-llama/CodeLlama-7b-hf dtype: string - name: masked_data_google/codegemma-7b dtype: string - name: expert_accuracy_meta-llama/CodeLlama-7b-hf dtype: string - name: error_codes_meta-llama/CodeLlama-7b-hf dtype: string - name: predict_google/codegemma-7b dtype: string - name: predicted_comment_google/codegemma-7b dtype: string - name: expert_accuracy_google/codegemma-7b dtype: string - name: error_codes_google/codegemma-7b dtype: string splits: - name: train num_bytes: 24073258 num_examples: 500 download_size: 9180742 dataset_size: 24073258 - config_name: English features: - name: file_id dtype: string - name: content dtype: string - name: repo dtype: string - name: path dtype: string - name: original_comment dtype: string - name: masked_data_Qwen/CodeQwen1.5-7B dtype: string - name: predict_Qwen/CodeQwen1.5-7B dtype: string - name: predicted_comment_Qwen/CodeQwen1.5-7B dtype: string - name: masked_data_bigcode/starcoder2-7b dtype: string - name: predict_bigcode/starcoder2-7b dtype: string - name: predicted_comment_bigcode/starcoder2-7b dtype: string - name: masked_data_ibm-granite/granite-8b-code-base dtype: string - name: predict_ibm-granite/granite-8b-code-base dtype: string - name: predicted_comment_ibm-granite/granite-8b-code-base dtype: string - name: masked_data_meta-llama/CodeLlama-7b-hf dtype: string - name: predict_meta-llama/CodeLlama-7b-hf dtype: string - name: predicted_comment_meta-llama/CodeLlama-7b-hf dtype: string - name: masked_data_google/codegemma-7b dtype: string - name: predict_google/codegemma-7b dtype: string - name: predicted_comment_google/codegemma-7b dtype: string - name: error_codes_Qwen/CodeQwen1.5-7B dtype: string - name: expert_accuracy_Qwen/CodeQwen1.5-7B dtype: string - name: error_codes_bigcode/starcoder2-7b dtype: string - name: expert_accuracy_bigcode/starcoder2-7b dtype: string - name: error_codes_ibm-granite/granite-8b-code-base dtype: string - name: expert_accuracy_ibm-granite/granite-8b-code-base dtype: string - name: error_codes_meta-llama/CodeLlama-7b-hf dtype: string - name: expert_accuracy_meta-llama/CodeLlama-7b-hf dtype: string - name: error_codes_google/codegemma-7b dtype: string - name: expert_accuracy_google/codegemma-7b dtype: string splits: - name: train num_bytes: 20540810 num_examples: 500 download_size: 8130598 dataset_size: 20540810 - config_name: Greek features: - name: file_id dtype: string - name: content dtype: string - name: repo dtype: string - name: path dtype: string - name: original_comment dtype: string - name: masked_data_Qwen/CodeQwen1.5-7B dtype: string - name: predict_Qwen/CodeQwen1.5-7B dtype: string - name: predicted_comment_Qwen/CodeQwen1.5-7B dtype: string - name: masked_data_bigcode/starcoder2-7b dtype: string - name: predict_bigcode/starcoder2-7b dtype: string - name: predicted_comment_bigcode/starcoder2-7b dtype: string - name: masked_data_ibm-granite/granite-8b-code-base dtype: string - name: predict_ibm-granite/granite-8b-code-base dtype: string - name: predicted_comment_ibm-granite/granite-8b-code-base dtype: string - name: masked_data_meta-llama/CodeLlama-7b-hf dtype: string - name: predict_meta-llama/CodeLlama-7b-hf dtype: string - name: predicted_comment_meta-llama/CodeLlama-7b-hf dtype: string - name: masked_data_google/codegemma-7b dtype: string - name: predict_google/codegemma-7b dtype: string - name: predicted_comment_google/codegemma-7b dtype: string - name: error_codes_bigcode/starcoder2-7b dtype: string - name: error_codes_ibm-granite/granite-8b-code-base dtype: string - name: error_codes_meta-llama/CodeLlama-7b-hf dtype: string - name: error_codes_google/codegemma-7b dtype: string - name: error_codes_Qwen/CodeQwen1.5-7B dtype: string - name: expert_accuracy_bigcode/starcoder2-7b dtype: string - name: expert_accuracy_ibm-granite/granite-8b-code-base dtype: string - name: expert_accuracy_meta-llama/CodeLlama-7b-hf dtype: string - name: expert_accuracy_google/codegemma-7b dtype: string - name: expert_accuracy_Qwen/CodeQwen1.5-7B dtype: string splits: - name: train num_bytes: 25626813 num_examples: 500 download_size: 9167871 dataset_size: 25626813 - config_name: Polish features: - name: file_id dtype: string - name: repo dtype: string - name: path dtype: string - name: content dtype: string - name: original_comment dtype: string - name: masked_data_Qwen/CodeQwen1.5-7B dtype: string - name: predict_Qwen/CodeQwen1.5-7B dtype: string - name: predicted_comment_Qwen/CodeQwen1.5-7B dtype: string - name: masked_data_bigcode/starcoder2-7b dtype: string - name: predict_bigcode/starcoder2-7b dtype: string - name: predicted_comment_bigcode/starcoder2-7b dtype: string - name: masked_data_ibm-granite/granite-8b-code-base dtype: string - name: predict_ibm-granite/granite-8b-code-base dtype: string - name: predicted_comment_ibm-granite/granite-8b-code-base dtype: string - name: masked_data_meta-llama/CodeLlama-7b-hf dtype: string - name: predict_meta-llama/CodeLlama-7b-hf dtype: string - name: predicted_comment_meta-llama/CodeLlama-7b-hf dtype: string - name: masked_data_google/codegemma-7b dtype: string - name: predict_google/codegemma-7b dtype: string - name: predicted_comment_google/codegemma-7b dtype: string - name: error_codes_Qwen/CodeQwen1.5-7B dtype: string - name: expert_accuracy_Qwen/CodeQwen1.5-7B dtype: string - name: error_codes_bigcode/starcoder2-7b dtype: string - name: expert_accuracy_bigcode/starcoder2-7b dtype: string - name: error_codes_ibm-granite/granite-8b-code-base dtype: string - name: expert_accuracy_ibm-granite/granite-8b-code-base dtype: string - name: error_codes_meta-llama/CodeLlama-7b-hf dtype: string - name: expert_accuracy_meta-llama/CodeLlama-7b-hf dtype: string - name: error_codes_google/codegemma-7b dtype: string - name: expert_accuracy_google/codegemma-7b dtype: string splits: - name: train num_bytes: 17775627 num_examples: 500 download_size: 7233103 dataset_size: 17775627 configs: - config_name: Chinese data_files: - split: train path: Chinese/train-* - config_name: Dutch data_files: - split: train path: Dutch/train-* - config_name: English data_files: - split: train path: English/train-* - config_name: Greek data_files: - split: train path: Greek/train-* - config_name: Polish data_files: - split: train path: Polish/train-* ---

数据集信息: - 配置名称:Chinese(中文) 特征项: - 名称:file_id,数据类型:字符串 - 名称:content,数据类型:字符串 - 名称:repo,数据类型:字符串 - 名称:path,数据类型:字符串 - 名称:original_comment,数据类型:字符串 - 名称:masked_data_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:predict_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:predicted_comment_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:masked_data_bigcode/starcoder2-7b,数据类型:字符串 - 名称:predict_bigcode/starcoder2-7b,数据类型:字符串 - 名称:predicted_comment_bigcode/starcoder2-7b,数据类型:字符串 - 名称:masked_data_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:predict_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:predicted_comment_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:masked_data_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:predict_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:predicted_comment_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:masked_data_google/codegemma-7b,数据类型:字符串 - 名称:predict_google/codegemma-7b,数据类型:字符串 - 名称:predicted_comment_google/codegemma-7b,数据类型:字符串 - 名称:expert_accuracy_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:error_codes_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:expert_accuracy_bigcode/starcoder2-7b,数据类型:字符串 - 名称:error_codes_bigcode/starcoder2-7b,数据类型:字符串 - 名称:expert_accuracy_google/codegemma-7b,数据类型:字符串 - 名称:error_codes_google/codegemma-7b,数据类型:字符串 - 名称:expert_accuracy_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:error_codes_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:expert_accuracy_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:error_codes_meta-llama/CodeLlama-7b-hf,数据类型:字符串 数据拆分: - 拆分名称:train,字节数:21631801,样本数量:500 下载体积:8927665,数据集总字节数:21631801 - 配置名称:Dutch(荷兰语) 特征项: - 名称:file_id,数据类型:字符串 - 名称:content,数据类型:字符串 - 名称:repo,数据类型:字符串 - 名称:path,数据类型:字符串 - 名称:original_comment,数据类型:字符串 - 名称:masked_data_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:predict_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:predicted_comment_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:masked_data_bigcode/starcoder2-7b,数据类型:字符串 - 名称:expert_accuracy_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:error_codes_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:predict_bigcode/starcoder2-7b,数据类型:字符串 - 名称:predicted_comment_bigcode/starcoder2-7b,数据类型:字符串 - 名称:masked_data_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:expert_accuracy_bigcode/starcoder2-7b,数据类型:字符串 - 名称:error_codes_bigcode/starcoder2-7b,数据类型:字符串 - 名称:predict_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:predicted_comment_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:masked_data_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:expert_accuracy_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:error_codes_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:predict_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:predicted_comment_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:masked_data_google/codegemma-7b,数据类型:字符串 - 名称:expert_accuracy_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:error_codes_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:predict_google/codegemma-7b,数据类型:字符串 - 名称:predicted_comment_google/codegemma-7b,数据类型:字符串 - 名称:expert_accuracy_google/codegemma-7b,数据类型:字符串 - 名称:error_codes_google/codegemma-7b,数据类型:字符串 数据拆分: - 拆分名称:train,字节数:24073258,样本数量:500 下载体积:9180742,数据集总字节数:24073258 - 配置名称:English(英语) 特征项: - 名称:file_id,数据类型:字符串 - 名称:content,数据类型:字符串 - 名称:repo,数据类型:字符串 - 名称:path,数据类型:字符串 - 名称:original_comment,数据类型:字符串 - 名称:masked_data_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:predict_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:predicted_comment_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:masked_data_bigcode/starcoder2-7b,数据类型:字符串 - 名称:predict_bigcode/starcoder2-7b,数据类型:字符串 - 名称:predicted_comment_bigcode/starcoder2-7b,数据类型:字符串 - 名称:masked_data_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:predict_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:predicted_comment_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:masked_data_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:predict_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:predicted_comment_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:masked_data_google/codegemma-7b,数据类型:字符串 - 名称:predict_google/codegemma-7b,数据类型:字符串 - 名称:predicted_comment_google/codegemma-7b,数据类型:字符串 - 名称:error_codes_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:expert_accuracy_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:error_codes_bigcode/starcoder2-7b,数据类型:字符串 - 名称:expert_accuracy_bigcode/starcoder2-7b,数据类型:字符串 - 名称:error_codes_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:expert_accuracy_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:error_codes_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:expert_accuracy_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:error_codes_google/codegemma-7b,数据类型:字符串 - 名称:expert_accuracy_google/codegemma-7b,数据类型:字符串 数据拆分: - 拆分名称:train,字节数:20540810,样本数量:500 下载体积:8130598,数据集总字节数:20540810 - 配置名称:Greek(希腊语) 特征项: - 名称:file_id,数据类型:字符串 - 名称:content,数据类型:字符串 - 名称:repo,数据类型:字符串 - 名称:path,数据类型:字符串 - 名称:original_comment,数据类型:字符串 - 名称:masked_data_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:predict_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:predicted_comment_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:masked_data_bigcode/starcoder2-7b,数据类型:字符串 - 名称:predict_bigcode/starcoder2-7b,数据类型:字符串 - 名称:predicted_comment_bigcode/starcoder2-7b,数据类型:字符串 - 名称:masked_data_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:predict_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:predicted_comment_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:masked_data_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:predict_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:predicted_comment_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:masked_data_google/codegemma-7b,数据类型:字符串 - 名称:predict_google/codegemma-7b,数据类型:字符串 - 名称:predicted_comment_google/codegemma-7b,数据类型:字符串 - 名称:error_codes_bigcode/starcoder2-7b,数据类型:字符串 - 名称:error_codes_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:error_codes_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:error_codes_google/codegemma-7b,数据类型:字符串 - 名称:error_codes_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:expert_accuracy_bigcode/starcoder2-7b,数据类型:字符串 - 名称:expert_accuracy_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:expert_accuracy_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:expert_accuracy_google/codegemma-7b,数据类型:字符串 - 名称:expert_accuracy_Qwen/CodeQwen1.5-7B,数据类型:字符串 数据拆分: - 拆分名称:train,字节数:25626813,样本数量:500 下载体积:9167871,数据集总字节数:25626813 - 配置名称:Polish(波兰语) 特征项: - 名称:file_id,数据类型:字符串 - 名称:content,数据类型:字符串 - 名称:repo,数据类型:字符串 - 名称:path,数据类型:字符串 - 名称:original_comment,数据类型:字符串 - 名称:masked_data_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:predict_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:predicted_comment_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:masked_data_bigcode/starcoder2-7b,数据类型:字符串 - 名称:predict_bigcode/starcoder2-7b,数据类型:字符串 - 名称:predicted_comment_bigcode/starcoder2-7b,数据类型:字符串 - 名称:masked_data_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:predict_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:predicted_comment_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:masked_data_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:predict_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:predicted_comment_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:masked_data_google/codegemma-7b,数据类型:字符串 - 名称:predict_google/codegemma-7b,数据类型:字符串 - 名称:predicted_comment_google/codegemma-7b,数据类型:字符串 - 名称:error_codes_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:expert_accuracy_Qwen/CodeQwen1.5-7B,数据类型:字符串 - 名称:error_codes_bigcode/starcoder2-7b,数据类型:字符串 - 名称:expert_accuracy_bigcode/starcoder2-7b,数据类型:字符串 - 名称:error_codes_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:expert_accuracy_ibm-granite/granite-8b-code-base,数据类型:字符串 - 名称:error_codes_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:expert_accuracy_meta-llama/CodeLlama-7b-hf,数据类型:字符串 - 名称:error_codes_google/codegemma-7b,数据类型:字符串 - 名称:expert_accuracy_google/codegemma-7b,数据类型:字符串 数据拆分: - 拆分名称:train,字节数:17775627,样本数量:500 下载体积:7233103,数据集总字节数:17775627 配置项: - 配置名称:Chinese,数据文件: - 拆分:train,路径:Chinese/train-* - 配置名称:Dutch,数据文件: - 拆分:train,路径:Dutch/train-* - 配置名称:English,数据文件: - 拆分:train,路径:English/train-* - 配置名称:Greek,数据文件: - 拆分:train,路径:Greek/train-* - 配置名称:Polish,数据文件: - 拆分:train,路径:Polish/train-*
提供机构:
AISE-TUDelft
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作