Inter-rater agreement in the scoring of abstracts submitted to a primary care research conference

PubMed Central2002-03-26 更新2026-05-16 收录

下载链接：

https://pmc.ncbi.nlm.nih.gov/articles/PMC101393/

下载链接

链接失效反馈

官方服务：

资源简介：

BACKGROUND: Checklists for peer review aim to guide referees when assessing the quality of papers, but little evidence exists on the extent to which referees agree when evaluating the same paper. The aim of this study was to investigate agreement on dimensions of a checklist between two referees when evaluating abstracts submitted for a primary care conference. METHODS: Anonymised abstracts were scored using a structured assessment comprising seven categories. Between one (poor) and four (excellent) marks were awarded for each category, giving a maximum possible score of 28 marks. Every abstract was assessed independently by two referees and agreement measured using intraclass correlation coefficients. Mean total scores of abstracts accepted and rejected for the meeting were compared using an unpaired t test. RESULTS: Of 52 abstracts, agreement between reviewers was greater for three components relating to study design (adjusted intraclass correlation coefficients 0.40 to 0.45) compared to four components relating to more subjective elements such as the importance of the study and likelihood of provoking discussion (0.01 to 0.25). Mean score for accepted abstracts was significantly greater than those that were rejected (17.4 versus 14.6, 95% CI for difference 1.3 to 4.1, p = 0.0003). CONCLUSIONS: The findings suggest that inclusion of subjective components in a review checklist may result in greater disagreement between reviewers. However in terms of overall quality scores, abstracts accepted for the meeting were rated significantly higher than those that were rejected.

背景：同行评议检查表旨在指导审稿人评估论文质量，但目前鲜有证据表明审稿人在评估同一篇论文时的共识程度。本研究旨在探讨两位审稿人在评估提交至基层医疗会议的摘要时，在检查表各维度上的共识情况。方法：本研究采用包含7个类别的结构化评估工具对匿名化后的摘要进行评分。每个类别可获得1分（极差）至4分（优秀）的评分，总分最高为28分。所有摘要均由两位审稿人独立评估，并通过组内相关系数（intraclass correlation coefficients）衡量共识程度。采用非配对t检验比较会议录用与拒录摘要的平均总得分。结果：在52篇摘要中，审稿人在3项与研究设计相关的维度上的共识程度更高（校正后组内相关系数为0.40~0.45），而在4项更为主观的维度（如研究重要性、引发讨论的可能性）上的共识程度较低（0.01~0.25）。录用摘要的平均得分显著高于拒录摘要（17.4 vs 14.6，差值的95%置信区间为1.3~4.1，p=0.0003）。结论：本研究结果表明，在同行评议检查表中纳入主观维度可能会加剧审稿人间的意见分歧。但就整体质量得分而言，会议录用的摘要评分显著高于拒录摘要。

提供机构：

BMC

创建时间：

2002-03-26