FIRE14 Detection of SOurce COde Re-use

NIAID Data Ecosystem2026-03-14 收录

下载链接：

https://zenodo.org/record/7357804

下载链接

链接失效反馈

官方服务：

资源简介：

This data was used for the PAN shared task on source code re-use detection at FIRE2014. Please find the task description at https://pan.webis.de/fire14/pan14-web/index.html. THIS DATA For the training phase we provide an annotated corpus including with the programming language extensions. It includes information about whether a text fragment has been re-used and, if it is the case, what its source is. The collection consists of source codes written in Java and C. Re-use is commited in both programming languages but ONLY at monolingual level. The Java collection contains 259 source codes from 000.java to 258.java. The C collection contains 79 source codes from 000.c to 078.c. Relevance Judgements represent re-use in both directions(a→b and b→a) In the test phase the only annotation that will be provided in the corpus is the programming language extensions. It is divided by programming language (C/C++ and JAVA) so you do not need any pre-process to identify the programming language of the source codes. Each programming language folder contains 6 folders (A1, B1, B2, C1 and C2) that contains a specific scenario with monolingual re-use. There is not re-use between scenarios so you just need to look for re-used cases among the source code files inside each folder. The name of the files consists of the name of the task which they belong plus an identifier. For example, file "B10021" belongs to scenario B1 and its identifier number is 0021. It could not exist re-use between source codes that belong to different scenarios. For example, you do not have to submit a re-used case between files "B10021" and "B20013". The first one belongs to scenario B1 but the second one belongs to B2.

创建时间：

2022-12-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集