AISE-TUDelft/MIA-public
收藏Hugging Face2025-11-04 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/AISE-TUDelft/MIA-public
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了文件的详细信息,如ID、文件名、路径、内容、大小、语言、扩展名、行数、平均行长度、最大行长度、字母数字比例,以及代码仓库的相关信息,如仓库名、星标数、分支数、打开的问题数、许可证类型和提取日期。数据集还包含了关于文件副本的信息,如精确副本和近似副本。数据集分为训练集和测试集,每个集合包含25000个示例,大小为138MB。数据集的总下载大小为88MB,总数据大小为276MB。
The dataset contains detailed information about files such as ID, file name, path, content, size, language, extension, number of lines, average line length, maximum line length, alphanumeric fraction, as well as information about the code repository such as repository name, star count, fork count, open issues, license type, and extraction date. The dataset also includes information about file duplicates, such as exact duplicates and near duplicates. The dataset is split into training and test sets, each containing 25,000 examples and 138MB in size. The total download size of the dataset is 88MB, and the total data size is 276MB.
提供机构:
AISE-TUDelft



