five

TIMSS 2011数学与科学数据集

收藏
arXiv2024-04-02 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2404.01799v1
下载链接
链接失效反馈
官方服务:
资源简介:
本数据集由乌特勒支大学开发,包含2011年和2008年的TIMSS(国际数学与科学趋势研究)数据,用于评估大型语言模型在8年级数学和科学领域的学术能力。数据集包含多个测试项目,涵盖数学的多个子领域,如代数、几何等,以及科学的相应领域。创建过程中,研究团队利用了心理测量学的原理,确保数据集的质量和可靠性。该数据集主要用于比较语言模型与人类学生在数学和科学领域的性能,以推动语言模型在教育评估中的应用。

This dataset was developed by Utrecht University, and comprises data from the Trends in International Mathematics and Science Study (TIMSS) conducted in 2011 and 2008. It is intended to evaluate the academic proficiency of large language models (LLMs) in 8th-grade mathematics and science. The dataset includes a variety of test items covering multiple subfields of mathematics, such as algebra, geometry, and other relevant areas, as well as corresponding domains of science. During its development, the research team adopted psychometric principles to ensure the quality and reliability of the dataset. This dataset is primarily utilized to compare the performance of language models and human students in mathematics and science, thereby advancing the application of language models in educational assessment.
提供机构:
乌特勒支大学
创建时间:
2024-04-02
二维码
社区交流群
二维码
科研交流群
商业服务