Quantifying interobserver variability and its anatomical distribution in routine echocardiography: consequences for metric reliability and statistical power
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Quantifying_interobserver_variability_and_its_anatomical_distribution_in_routine_echocardiography_consequences_for_metric_reliability_and_statistical_power/29898572
下载链接
链接失效反馈官方服务:
资源简介:
Manual delineation of cardiac structures in echocardiography is central to clinical assessment, yet interobserver variability remains poorly characterized in real-world settings. This variability can undermine the reliability of clinical metrics and compromise the design of studies with echocardiography-based endpoints. This study aims to quantify interobserver variability in routine echocardiography, evaluate its impact on clinical metrics and statistical power, and identify anatomical contributors to segmentation error. End-systolic and end-diastolic frames from apical 4- and 2-chamber views in transthoracic sequences were manually segmented by experienced sonographers. Two complementary experiments were conducted: (a) a dataset of 628 patients, of which 287 patient studies were annotated by two observers, and (b) a subset of 10 studies annotated by 35 observers and an expert committee. Interobserver variability was assessed using intraclass correlation coefficients (ICC), coefficients of variation (CV), and a simulation-based analysis of statistical power. A model-derived proxy for CV was introduced to address limitations in sparsely annotated designs. Observer variance components were reported for 20 clinical metrics. Key clinical metrics, including left ventricular ejection fraction and fractional area change, showed low reliability (ICC < 0.50), even after accounting for wide population variance ranges. Observer variability significantly inflated sample size requirements, with power reductions of up to 50% in typical trial scenarios. Anatomical analysis revealed systematic error patterns, with highest discrepancies in apical segments, suggesting that targeted training and standardized protocols may improve reproducibility. Our results offer quantitative references for designing more robust clinical studies and highlight the need for automated or semi-automated segmentation tools in routine practice.
创建时间:
2025-08-13



