Data_Sheet_3_Evaluating chatGPT-4 and chatGPT-4o: performance insights from NAEP mathematics problem solving.csv
收藏frontiersin.figshare.com2024-09-18 更新2025-01-16 收录
下载链接:
https://frontiersin.figshare.com/articles/dataset/Data_Sheet_3_Evaluating_chatGPT-4_and_chatGPT-4o_performance_insights_from_NAEP_mathematics_problem_solving_csv/27049744/1
下载链接
链接失效反馈官方服务:
资源简介:
This study assesses the capabilities of OpenAI’s ChatGPT-4 and ChatGPT-4o in solving mathematics problems from the National Assessment of Educational Progress (NAEP) across grades 4, 8, and 12. Results indicate that ChatGPT-4o slightly outperform ChatGPT-4 and both models generally surpass U.S. students’ performance across all grades, content areas, item type, and difficulty level. However, both models perform worse on geometry and measurement than on algebra and face more difficulties with high-difficulty mathematics items. This investigation highlights the strengths and limitations of AI as a supplementary educational tool, pinpointing areas for improvement in spatial intelligence and complex mathematical problem-solving. These findings suggest that while AI has the potential to support instruction in specific mathematical areas like algebra, there remains a need for careful integration and teacher-mediated strategies in areas where AI is less effective.
本研究评估了OpenAI的ChatGPT-4与ChatGPT-4o在解决国家教育进展评估(NAEP)四年级、八年级和十二年级数学问题方面的能力。结果表明,ChatGPT-4o在性能上略优于ChatGPT-4,且两者模型在所有年级、学科领域、题目类型和难度级别上普遍超越了美国学生的表现。然而,在几何和测量方面,两者模型的性能均不及代数,且在高难度数学题目上面临更多困难。此研究凸显了人工智能作为辅助教育工具的优势与局限,指出了在空间智能和复杂数学问题解决领域改进的要点。这些发现表明,尽管人工智能在特定数学领域如代数教学中具有潜在支持作用,但在人工智能效果较弱的部分仍需谨慎整合以及教师介导的策略。
提供机构:
Frontiers



