five

FIRE14 Detection of SOurce COde Re-use

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7357804
下载链接
链接失效反馈
官方服务:
资源简介:
This data was used for the PAN shared task on source code re-use detection at FIRE2014.  Please find the task description at https://pan.webis.de/fire14/pan14-web/index.html. THIS DATA For the training phase we provide an annotated corpus including with the programming language extensions. It includes information about whether a text fragment has been re-used and, if it is the case, what its source is. The collection consists of source codes written in Java and C. Re-use is commited in both programming languages but ONLY at monolingual level. The Java collection contains 259 source codes from 000.java to 258.java. The C collection contains 79 source codes from 000.c to 078.c. Relevance Judgements represent re-use in both directions(a→b and b→a) In the test phase the only annotation that will be provided in the corpus is the programming language extensions. It is divided by programming language (C/C++ and JAVA) so you do not need any pre-process to identify the programming language of the source codes. Each programming language folder contains 6 folders (A1, B1, B2, C1 and C2) that contains a specific scenario with monolingual re-use. There is not re-use between scenarios so you just need to look for re-used cases among the source code files inside each folder. The name of the files consists of the name of the task which they belong plus an identifier. For example, file "B10021" belongs to scenario B1 and its identifier number is 0021. It could not exist re-use between source codes that belong to different scenarios. For example, you do not have to submit a re-used case between files "B10021" and "B20013". The first one belongs to scenario B1 but the second one belongs to B2.
创建时间:
2022-12-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作