A New Peer Reviewer? Comparing AI versus Human Performance in RCT Risk-of-Bias Assessment

Figshare2026-02-20 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/_p_dir_ltr_A_New_Peer_Reviewer_Comparing_AI_versus_Human_Performance_in_RCT_Risk-of-Bias_Assessment_p_/31382911

下载链接

链接失效反馈

官方服务：

资源简介：

Background: Risk-of-bias (RoB) appraisal is essential to evidence synthesis but remains time-consuming and subjective. Artificial intelligence (AI) has the potential to introduce efficiency in systematic reviews but its reliability in reproducing expert RoB decisions is uncertain. This study compares AI and human performance in RoB assessment of randomized controlled trials (RCTs) with the revised Joanna Briggs Institute (JBI) critical appraisal tool.Methods: Thirteen 2023-2025 orthopedic journal RCTs were independently rated by two human raters (expert, R1; novice, R2) and two AI models (ChatGPT-4.0, DeepSeek-R1) using the 13-domain JBI checklist. Deep-reasoning functions (e.g., Chain-of-Thought) were engaged. Inter-rater agreement, R1 deviations (gold standard), and binary flips (e.g., "Yes" vs. "No" disagreements) were investigated to determine appraisal consistency.Results: AI models showed high inter-model concordance (91%), higher than human–AI concordance (CGPT and R1: 64%; DS and R1: 68%). AI systems both diverged significantly from expert view in interpretive judgment domains such as allocation concealment (Q2), blinding (Q7), and trial structure overall (Q13) with deviation rates ranging from 30% to 38.5%. Reversals of binary decisions were significantly more common in AI assessment (CGPT: 8.9%, DS: 7.7%) than in the human comparator (R2 vs. R1: 2.4%). Human performance was better in contextual comprehension (R1–R2 agreement: 89.3%), but AI systems fared better in formal, rule-based activities (Q8/Q9: 100% agreement).Conclusion: AI can reliably automate objective components of RoB appraisal but struggles with interpretive and context-dependent judgments. Combining AI pre-screening with expert review may improve the scalability of systematic reviews without compromising methodological rigor.

创建时间：

2026-02-20