Programming Homework Dataset for Plagiarism Detection
收藏IEEE2020-05-08 更新2026-04-17 收录
下载链接:
https://ieee-dataport.org/open-access/programming-homework-dataset-plagiarism-detection
下载链接
链接失效反馈官方服务:
资源简介:
Dataset is intended for studying how student programming styles and usage of IDE differs between students who plagiarise their homework and students who solve them honestly.Dataset includes homeworks submitted by students during two introductory programming courses (A and B) delivered during two years (2016 and 2017). A is delivered in C programming language, while B is delivered in C++. In addition to homeworks, dataset includes full traces of all student activity and keystrokes during homework development. These traces were generated by setting the IDE to autosave after 1 second of inactivity, after which the file was committed to a SVN repository. For size reason, these repositories were then processed into JSON files actually stored. In addition, IDE was configured to pass output from student programs, compiler, debugger, profiler and unit testing into separate invisible files which were also stored in this repository. Finally, dataset includes ground truth with homeworks which are assumed to be plagiarised because of high similarity and the fact that (one of) students failed to do oral defense of homework.
提供机构:
Ljubovic, Vedran
创建时间:
2020-05-08



