IR-Plag
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/oscarkarnalim/sourcecodeplagiarismdataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为IR-Plag,旨在为代码相似度检测技术提供基准测试,其中包含了模拟学术抄袭模式的代码文件,具有多种复杂度。该数据集共包含467个代码文件,总计59,201个标记,其中355个被标记为抄袭。数据集规模较小,适用于代码相似度评估和抄袭检测任务。
The dataset, named IR-Plag, is developed as a benchmark for code similarity detection technologies. It encompasses code files that simulate academic plagiarism patterns with diverse complexity levels. In total, this dataset contains 467 code files and 59,201 tokens in all, out of which 355 are labeled as plagiarized. With a modest scale, IR-Plag is applicable to code similarity evaluation and plagiarism detection tasks.
提供机构:
Oscar Karnalim



