five

A Reliability Study for Evaluating Information Extraction from Radiology Reports

收藏
PubMed Central2026-05-16 收录
下载链接:
https://pmc.ncbi.nlm.nih.gov/articles/PMC61353/
下载链接
链接失效反馈
官方服务:
资源简介:
Goal: To assess the reliability of a reference standard for an information extraction task. Setting: Twenty-four physician raters from two sites and two specialities judged whether clinical conditions were present based on reading chest radiograph reports. Methods: Variance components, generalizability (reliability) coefficients, and the number of expert raters needed to generate a reliable reference standard were estimated. Results: Per-rater reliability averaged across conditions was 0.80 (95% CI, 0.79-0.81). Reliability for the nine individual conditions varied from 0.67 to 0.97, with central line presence and pneumothorax the most reliable, and pleural effusion (excluding CHF) and pneumonia the least reliable. One to two raters were needed to achieve a reliability of 0.70, and six raters, on average, were required to achieve a reliability of 0.95. This was far more reliable than a previously published per-rater reliability of 0.19 for a more complex task. Differences between sites were attributable to changes to the condition definitions. Conclusion: In these evaluations, physician raters were able to judge very reliably the presence of clinical conditions based on text reports. Once the reliability of a specific rater is confirmed, it would be possible for that rater to create a reference standard reliable enough to assess aggregate measures on a system. Six raters would be needed to create a reference standard sufficient to assess a system on a case-by-case basis. These results should help evaluators design future information extraction studies for natural language processors and other knowledge-based systems.
提供机构:
Oxford University Press
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作