five

A Dataset of Prolog Submissions for Feedback Research: 7201 Student Programs, 200 Manual Annotations and 16000 Synthetic Instances

收藏
DataCite Commons2026-04-03 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/A_Dataset_of_Prolog_Submissions_for_Feedback_Research_7201_Programs_and_200_Manual_Annotations/29899583
下载链接
链接失效反馈
官方服务:
资源简介:
Here's the updated README:This dataset contains versioned Prolog program submissions from students, annotated with debugging and test result metadata. It is intended for research on automated feedback, bug fixing, program repair, and learning analytics in logic programming education.Files Description<b>programs.jsonl.gz</b> contains 7201 Prolog submissions<b>programs_sample.json</b> contains a sample of 10 entries from <b>programs.jsonl.gz</b> in human readable format<b>programs_annotated.jsonl</b> contains 200 programs from <b>programs.jsonl.gz</b> with additional manually created annotations<b>programs_annotated_sample.json</b> contains a sample of 10 entries from <b>programs_annotated.jsonl</b> in human readable format<b>synthetic.jsonl</b> contains a collection of synthetic Prolog instances generated by applying mutations to student submissions, annotated with debugging and test result metadataDataset Structureprograms.jsonl.gzEach entry in <b>programs.jsonl.gz</b> is a JSON object with the following fields:<b>student_id</b>: Anonymized integer identifier for the student.<b>sequence_id</b>: Integer indicating the submission order for a given assignment.<b>time</b>: UNIX timestamp of the submission.<b>assignment</b>: Path or identifier of the assignment.<b>assignment_group</b>: Higher-level grouping of assignments (e.g. "labs").<b>program</b>: The student's submitted Prolog program.<b>previous_submission</b>: The student's immediately prior submission for the same assignment.<b>previous_tests_passed</b>: List of test cases passed in the previous submission.<b>passed_tests</b>: List of test cases passed by the current submission.<b>failed_tests</b>: List of test cases that failed in the current submission.<b>tests_passed_count</b>: Count of passed tests.<b>total_tests_count</b>: Total number of tests executed.<b>correct</b>: Boolean indicating whether the submission passes all tests.<b>category</b>: Label describing the nature of the change (e.g. "BUGFIX_CORRECT").<b>diff</b>: A diff string showing changes between the previous and current submission.<b>parsed</b>: Normalized version of the current program.<b>syntax_error</b>: Syntax error message if present (else null).<b>interpreter_syntax_error</b>: Syntax errors raised by the Prolog interpreter (if any).<b>is_syntax_fix</b>: Boolean indicating if the fix resolved a syntax issue.<b>predicates</b>: List of predicates defined in the current program.<b>modified_predicates</b>: Predicates modified since the previous submission.<b>modified_predicates_count</b>: Number of predicates modified.<b>modified_predicates_fraction</b>: Fraction of modified predicates relative to total.<b>clauses</b>: Total number of clauses in the program.<b>modified_clauses</b>: Number of modified clauses.<b>modified_clauses_fraction</b>: Fraction of modified clauses.programs_annotated.jsonlEntries in <b>programs_annotated.jsonl</b> include all fields from <b>programs.jsonl.gz</b>, plus:<b>bugfix_labels</b>: List of labeled bug categories identified in the submission (e.g. "CUT_ISSUE:MISSING").<b>minimal_change</b>: Boolean indicating whether the fix involved minimal edits.synthetic.jsonlEach entry in <b>synthetic.jsonl</b> is a JSON object with the following fields:<b>student_id</b>: Integer identifier of the original student submission used as the base.<b>sequence_id</b>: Integer indicating the submission order for the original assignment.<b>assignment</b>: Path or identifier of the assignment.<b>assignment_group</b>: Higher-level grouping of assignments (e.g. "labs").<b>original</b>: The original correct Prolog program before mutation.<b>mutations</b>: List of mutations applied, each specifying the affected clause index and the mutation type (e.g. "BugType.WRONG_PREDICATE_NAME").<b>mutated</b>: The resulting Prolog program after mutation.<b>diff</b>: A standard diff string between the original and mutated program.<b>lazy_diff</b>: A simplified diff omitting line numbers.<b>modified_clauses</b>: Clause identifiers that were modified by the mutation.<b>extra_clauses</b>: Clause identifiers introduced as a side effect of the mutation.<b>incomplete_predicates</b>: Predicates rendered incomplete by the mutation.<b>modified_terms</b>: Terms within modified clauses that were altered.<b>extra_terms</b>: Terms introduced in extra clauses.<b>incomplete_clauses</b>: Clauses rendered incomplete by the mutation.<b>failed_tests</b>: List of test cases failed by the mutated program.<b>passed_tests</b>: List of test cases passed by the mutated program.<b>valid_clauses</b>: Clause identifiers that remain syntactically and semantically valid after mutation.<b>id_clause_pairs</b>: List of [clause_id, clause_text] pairs for all clauses in the mutated program.<b>faulty_clauses_actually</b>: List of clause texts considered faulty in the mutated program.<b>relevant_predicates</b>: Predicates relevant to the assignment solution.<b>mutated_sliced</b>: A slice of the mutated program relevant to the failing tests.<b>diff_sliced</b>: Diff between the original and the sliced mutated program.<b>lazy_diff_sliced</b>: Simplified version of <b>diff_sliced</b>.
提供机构:
figshare
创建时间:
2025-08-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作