five

A Dataset of Prolog Submissions for Feedback Research: 7201 Programs and 200 Manual Annotations

收藏
DataCite Commons2026-04-03 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/A_Dataset_of_Prolog_Submissions_for_Feedback_Research_7201_Programs_and_200_Manual_Annotations/29899583/1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains versioned Prolog program submissions from students, annotated with debugging and test result metadata. It is intended for research on automated feedback, bug fixing, program repair, and learning analytics in logic programming education.Files Descriptionprograms.jsonl.gz contains 7201 Prolog submissionsprograms_sample.json contains a sample of 10 entries from programs.jsonl.gz in human readable formatprograms_annotated.jsonl contains 200 programs from programs.jsonl.gz with additional manually created annotationsprograms_annotated_sample.json contains a sample of 10 entries from programs_annotated.jsonl in human readable formatDataset StructureEach dataset entry in programs.jsonl.gz is a JSON object with the following fields:<b>student_id</b>: Anonymized integer identifier for the student.<b>sequence_id</b>: Integer indicating the submission order for a given assignment.<b>time</b>: UNIX timestamp of the submission.<b>assignment</b>: Path or identifier of the assignment.<b>assignment_group</b>: Higher-level grouping of assignments (e.g. "labs").<b>program</b>: The student's submitted Prolog program.<b>previous_submission</b>: The student's immediately prior submission for the same assignment.<b>previous_tests_passed</b>: List of test cases passed in the previous submission.<b>passed_tests</b>: List of test cases passed by the current submission.<b>failed_tests</b>: List of test cases that failed in the current submission.<b>tests_passed_count</b>: Count of passed tests.<b>total_tests_count</b>: Total number of tests executed.<b>correct</b>: Boolean indicating whether the submission passes all tests.<b>category</b>: Label describing the nature of the change (e.g. <code>"BUGFIX_CORRECT"</code>).<b>diff</b>: A diff string showing changes between the previous and current submission.<b>parsed</b>: Normalized version of the current program.<b>syntax_error</b>: Syntax error message if present (else <code>null</code>).<b>interpreter_syntax_error</b>: Syntax errors raised by the Prolog interpreter (if any).<b>is_syntax_fix</b>: Boolean indicating if the fix resolved a syntax issue.<b>predicates</b>: List of predicates defined in the current program.<b>modified_predicates</b>: Predicates modified since the previous submission.<b>modified_predicates_count</b>: Number of predicates modified.<b>modified_predicates_fraction</b>: Fraction of modified predicates relative to total.<b>clauses</b>: Total number of clauses in the program.<b>modified_clauses</b>: Number of modified clauses.<b>modified_clauses_fraction</b>: Fraction of modified clauses.Additionally, the entries in programs_annotated.jsonl also have the following two fields:<b>bugfix_labels</b>: List of labeled bug categories identified in the submission (e.g., <code>"CUT_ISSUE:MISSING"</code>).<b>minimal_change</b>: Boolean indicating whether the fix involved minimal edits.<br>

本数据集收录了学生提交的带版本管理的Prolog程序,并附带调试与测试结果元数据标注。其旨在支持逻辑程序设计教育领域中自动化反馈、缺陷修复、程序修复以及学习分析相关的研究工作。 文件说明 programs.jsonl.gz 包含7201份Prolog程序提交记录 programs_sample.json 为programs.jsonl.gz中10条记录的样本,采用人类可读格式 programs_annotated.jsonl 源自programs.jsonl.gz中的200条程序提交,额外附带人工创建的标注信息 programs_annotated_sample.json 为programs_annotated.jsonl中10条记录的样本,采用人类可读格式 数据集结构 programs.jsonl.gz 中的每条数据集条目均为JSON对象,包含以下字段: - student_id:学生的匿名整数标识符 - sequence_id:用于标识某次作业提交顺序的整数 - time:提交的UNIX时间戳 - assignment:作业的路径或唯一标识符 - assignment_group:作业的高阶分组(例如“实验课程(labs)”) - program:学生提交的Prolog程序 - previous_submission:学生针对同一作业的上一次提交记录 - previous_tests_passed:上一次提交中通过的测试用例列表 - passed_tests:当前提交通过的测试用例列表 - failed_tests:当前提交未通过的测试用例列表 - tests_passed_count:通过测试的总数量 - total_tests_count:本次执行的测试用例总数 - correct:布尔值,用于标识当前提交是否通过所有测试 - category:描述变更性质的标签(例如`"BUGFIX_CORRECT"`) - diff:展示当前提交与上一次提交之间变更的差异字符串 - parsed:当前程序的规范化版本 - syntax_error:若存在语法错误则返回对应错误信息,否则为null - interpreter_syntax_error:Prolog解释器抛出的语法错误(若存在) - is_syntax_fix:布尔值,标识该修复是否解决了语法问题 - predicates:当前程序中定义的谓词列表 - modified_predicates:相较于上一次提交被修改的谓词 - modified_predicates_count:被修改的谓词数量 - modified_predicates_fraction:被修改谓词占总谓词的比例 - clauses:程序中的子句总数 - modified_clauses:被修改的子句数量 - modified_clauses_fraction:被修改子句占总子句的比例 此外,programs_annotated.jsonl 中的条目还额外包含以下两个字段: - bugfix_labels:提交中识别出的缺陷类别的标注列表(例如`"CUT_ISSUE:MISSING"`) - minimal_change:布尔值,标识该修复是否仅涉及最小程度的编辑
提供机构:
figshare
创建时间:
2025-08-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作