five

Programming Homework Dataset for Plagiarism Detection

收藏
IEEE2020-05-08 更新2026-04-17 收录
下载链接:
https://ieee-dataport.org/open-access/programming-homework-dataset-plagiarism-detection
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset is intended for studying how student programming styles and usage of IDE differs between students who plagiarise their homework and students who solve them honestly.Dataset includes homeworks submitted by students during two introductory programming courses (A and B) delivered during two years (2016 and 2017). A is delivered in C programming language, while B is delivered in C++. In addition to homeworks, dataset includes full traces of all student activity and keystrokes during homework development. These traces were generated by setting the IDE to autosave after 1 second of inactivity, after which the file was committed to a SVN repository. For size reason, these repositories were then processed into JSON files actually stored. In addition, IDE was configured to pass output from student programs, compiler, debugger, profiler and unit testing into separate invisible files which were also stored in this repository. Finally, dataset includes ground truth with homeworks which are assumed to be plagiarised because of high similarity and the fact that (one of) students failed to do oral defense of homework.
提供机构:
Ljubovic, Vedran
创建时间:
2020-05-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作