Intraclass correlation coefficient results.
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Intraclass_correlation_coefficient_results_/30180214
下载链接
链接失效反馈官方服务:
资源简介:
Background
Integrating engineering design processes into science education has become a significant priority in STEM instruction. However, many science teachers face difficulties incorporating these processes due to limited pedagogical expertise. Generative artificial intelligence (GAI) tools such as ChatGPT offer potential support mechanisms by evaluating lesson plans and providing formative feedback. This study investigates the reliability and validity of GAI evaluations compared to expert assessments.
Methods
This mixed-methods study involved 43 science teachers who received professional development over four months to integrate engineering design into their lesson plans. A total of 52 lesson plans were evaluated using structured and unstructured prompts via ChatGPT 4.5, alongside evaluations by expert mentors. Quantitative data were analyzed using the Intraclass Correlation Coefficient (ICC) and Bland-Altman methods to assess inter-rater consistency. Qualitative data was analyzed through open and deductive coding to interpret differences in evaluation rationale.
Results
Findings revealed high consistency between structured prompt AI evaluations and expert assessments (ICC = 0.708), while unstructured prompts showed low and non-significant agreement (ICC = 0.076). Qualitative analysis indicated that AI evaluations, particularly those using structured prompts, tend to be more positive and holistic, whereas experts offered more detailed and critical feedback. Differences were also observed in evaluating dcomponents like problem definition, testability, and interdisciplinary integration.
Conclusion
Structured AI prompts offer reliable and valid evaluation results comparable to expert assessments and could serve as scalable tools in teacher support systems. However, unstructured prompts produce inconsistent outcomes and require refinement. The study highlights both the potential and limitations of using GAI tools for pedagogical evaluation in STEM education.
创建时间:
2025-09-22



