mlphys101 - Exploring the performance of Large-Language Models in multilingual undergraduate physics education
收藏DataCite Commons2024-09-11 更新2025-04-16 收录
下载链接:
https://rodare.hzdr.de/record/3137
下载链接
链接失效反馈官方服务:
资源简介:
Large-Language Models such as ChatGPT have the potential to revo-<br>
lutionize academic teaching in physics in a similar way the electronic calculator,<br>
the home computer or the internet did. AI models are patient, produce answers<br>
tailored to a student’s needs and are accessible whenever needed. Those involved<br>
in academic teaching are facing a number of questions: Just how reliable are pub-<br>
licly accessible models in answering, how does the question’s language affect the<br>
models’ performance and how well do the models perform with more difficult tasks<br>
beyond retrieval? To adress these questions, we benchmark a number of publicly<br>
available models on the mlphys101 dataset, a new set of 823 university level MC5<br>
questions and answers released alongside this work. While the original questions<br>
are in English, we employ GPT-4 to translate them into various other languages,<br>
followed by revision and refinement by native speakers. Our findings indicate that<br>
state-of-the-art models perform well on questions involving the replication of facts,<br>
definitions, and basic concepts, but struggle with multi-step quantitative reason-<br>
ing. This aligns with existing literature that highlights the challenges LLMs face<br>
in mathematical and logical reasoning tasks. We conclude that the most advanced<br>
current LLMs are a valuable addition to the academic curriculum and LLM pow-<br>
ered translations are a viable method to increase the accessibility of materials, but<br>
their utility for more difficult quantitative tasks remains limited.
The dataset is available in English here only and will be removed, once the mlphys101 publication was accepted and released to the public.
提供机构:
Rodare
创建时间:
2024-09-10



