AI and evaluating research quality
收藏DataCite Commons2025-05-15 更新2025-05-17 收录
下载链接:
https://www.openicpsr.org/openicpsr/project/229863/view
下载链接
链接失效反馈官方服务:
资源简介:
Assessing
the quality of studies is crucial for meta-analyses, but this is a
time-consuming and labor-intensive procedure. While recent advances in Large
Language Models (LLMs) offer a potential solution, their ability to provide
reliable appraisals remains largely unclear. In this study, we compared how two
humans and two LLMs rated the quality of forty educational intervention studies
based on a standardized tool. The agreement on the overall quality rating
ranges from none to fair across the four raters (-.02 ≤ κ ≤ .38), where the
human-human agreement was the highest. However, agreement levels with respect
to different quality criteria varied dramatically, and in several cases, the
agreements among the LLMs surpassed the human-human agreement. The
disagreements stemmed from missing information in the papers, legitimate
diverging considerations, and raters’ errors. These findings highlight both the
promise and limitations of using LLMs to evaluate the quality of educational
research.
提供机构:
ICPSR - Interuniversity Consortium for Political and Social Research
创建时间:
2025-05-15



