five

AI and evaluating research quality

收藏
DataCite Commons2025-05-15 更新2025-05-17 收录
下载链接:
https://www.openicpsr.org/openicpsr/project/229863/view
下载链接
链接失效反馈
官方服务:
资源简介:
Assessing the quality of studies is crucial for meta-analyses, but this is a time-consuming and labor-intensive procedure. While recent advances in Large Language Models (LLMs) offer a potential solution, their ability to provide reliable appraisals remains largely unclear. In this study, we compared how two humans and two LLMs rated the quality of forty educational intervention studies based on a standardized tool. The agreement on the overall quality rating ranges from none to fair across the four raters (-.02 ≤ κ ≤ .38), where the human-human agreement was the highest. However, agreement levels with respect to different quality criteria varied dramatically, and in several cases, the agreements among the LLMs surpassed the human-human agreement. The disagreements stemmed from missing information in the papers, legitimate diverging considerations, and raters’ errors. These findings highlight both the promise and limitations of using LLMs to evaluate the quality of educational research.
提供机构:
ICPSR - Interuniversity Consortium for Political and Social Research
创建时间:
2025-05-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作