SelfCode 2.0: Annotated Corpus of Student Self-Explanations to Introductory JAVA Programs in Computer Science
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10912668
下载链接
链接失效反馈官方服务:
资源简介:
Dataset Description: This dataset was collected during a lab study conducted in Spring 2022 for introductory JAVA programming. Students had to provide line by explanations to four JAVA programs in the experimental condition of the study. The JAVA Programs were selected from the examples made available in the PCEX Worked Examples interface. The explanations collected were then split by the number of attempts. Students could attempt twice based on the feedback provided using the the PCEX interface and in their third attempt they filled in the blanks to complete an explanation to the particular line of code. In this dataset, we only have the annotated examples of explanations provided by students. The explanations were annotated on their correctness (binary rating 0 or 1), completeness (binary rating 0 or 1) and similarity (rating scale 1 to 5).
Correctness: Given the line of code and context of the line in the program, if the student explanation covers **only** the topics relevant to the line of code
Completeness: Given the line of code and context of the line in the program, if the student explanation covers **all** the topics relevant to the line of code
Similarity: Given the line of code, the context of the line in the program and an expert explanation to the line of code, the metric compares the similarity on a rating scale from 1 to 5, defined in the following manner:
1 - expert and student explanations are very different,
2 -- expert and student explanations are somewhat alike, but there are major differences in the concepts / topics explained
3 -- expert and student explanations are similar but there are differences in the concepts / topics explained
4 -- expert and student explanations are similar and have few differences in the concepts / topics explained
5 -- expert and student explanations are very similar.
Overall 3000 single attempts (corresponding to 40 student explanation submission) were annotated against different various expert explanation pairs.
Dataset Summary:
Explanation Type N DefinitionExperts 2 Source Code Line-by-Line Explanations by ExpertsStudents 60 (annotated 40) Source Code Line-by-Line Explanations by Students
COUNT of std_sent_count
std_sent_count
1
2
3
4
5
6
Grand Total
1
1854
367
245
107
34
33
2640
2
222
46
40
12
6
6
332
3
21
5
5
5
1
2
39
4
2
1
2
3
8
Grand Total
2099
419
292
127
41
41
3019
Sample Data:
Program: PointTester; Line number: 12; Line code: private int y;Expert1: Every object of the Point class will have its own y-coordinate. Therefore, weneed to declare an instance variable for the class to store the y-coordinate of the point.We declare it as int because we want to have integer coordinates for the point. Notethat an instance variable is a variable defined in a class, for which each instantiatedobject of the class has a separate copy, or instance.Expert2: The instance variables are declared as private to prevent direct access tothem from outside the class. In this way, no unexpected modifications to a Pointobject’s data are possible.Student1: initialize a private value inside the point class with no value yetStudent2: Declares the private int variable y.Student3: Creates a private int that can only be accessed by class Point called int y...Student59: private variable used to store the value entered into the value of the ycoordinate
Kappa Scores:
Round
Row Numbers
Correctness Rating Agreement %age
Correctness Rating Kappa
Sufficiency Rating Agreement %age
Sufficiency Rating Kappa
1
1000 - 1432
92.9
0.365
0.708
-0.0123
2
1432 - 1864
94.2
0.263
77.6
0.329
3
1864 – 1964
75.3
0
70.3
0.299
4
1964 -- 2064
86
0.108
74.7
0.275
5
2064 – 2264
95.5
-0.0158
81.5
0.312
6
2264 – 2464
83.5
0.039
86.5
0.648
7
2464 – 2864
92
0.103
74.5
0.188
8
2864 -- 3005
86.5
-0.026
72.3
0.117
Citation Format:If using this dataset in your project please cite:
Lekshmi-Narayanan, A.-B., Chapagain, J., Brusilovsky, P., & Rus, V. (2023). SelfCode 2.0: Annotated Corpus of Student Self-Explanations to Introductory JAVA Programs in Computer Science [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10912669
Acknowledgements:This project was funded as a part of the NSF AWARD # 1822752
创建时间:
2025-03-31



