Automatic recognition of self-acknowledged limitations in clinical research literature

NIAID Data Ecosystem2026-03-11 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.06ds7

下载链接

链接失效反馈

官方服务：

资源简介：

Objective: To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency. Materials and Methods: To develop our recognition methods, we used a set of 8,431 sentences from 1,197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing and inter-annotator agreement was calculated. We cast the recognition problem as a binary classification task, in which we determine whether a given sentence from a publication discusses self-acknowledged limitations or not. We experimented with three methods: a rule-based approach based on document structure, supervised machine learning, and a semi-supervised method that uses self-training to expand the training set in order to improve classification performance. The machine learning algorithms used were logistic regression (LR) and support vector machines (SVM). Results: Annotators had good agreement in labeling limitation sentences (Krippendorff’s α=0.781). Of the three methods used, the rule-based method yielded the best performance with 91.5% accuracy (95% CI [90.1-92.9]), while self-training with SVM led to a small improvement over fully supervised learning (89.9%, 95% CI [88.4-91.4] vs. 89.6%, 95% CI [88.1-91.1]). Discussion: We attribute the effectiveness of the rule-based method to the highly localized and formulaic language used in reporting of limitations in clinical research publications. Experiments with training size and composition show that more data does not necessarily lead to higher accuracy in the machine learning-based approaches. Conclusion: The approach presented can be incorporated into the workflows of stakeholders focusing on research transparency to improve reporting of limitations in clinical studies.

创建时间：

2019-03-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集