five

Automatic recognition of self-acknowledged limitations in clinical research literature

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.06ds7
下载链接
链接失效反馈
官方服务:
资源简介:
Objective: To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency. Materials and Methods: To develop our recognition methods, we used a set of 8,431 sentences from 1,197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing and inter-annotator agreement was calculated. We cast the recognition problem as a binary classification task, in which we determine whether a given sentence from a publication discusses self-acknowledged limitations or not. We experimented with three methods: a rule-based approach based on document structure, supervised machine learning, and a semi-supervised method that uses self-training to expand the training set in order to improve classification performance. The machine learning algorithms used were logistic regression (LR) and support vector machines (SVM). Results: Annotators had good agreement in labeling limitation sentences (Krippendorff’s α=0.781). Of the three methods used, the rule-based method yielded the best performance with 91.5% accuracy (95% CI [90.1-92.9]), while self-training with SVM led to a small improvement over fully supervised learning (89.9%, 95% CI [88.4-91.4] vs. 89.6%, 95% CI [88.1-91.1]). Discussion: We attribute the effectiveness of the rule-based method to the highly localized and formulaic language used in reporting of limitations in clinical research publications. Experiments with training size and composition show that more data does not necessarily lead to higher accuracy in the machine learning-based approaches. Conclusion: The approach presented can be incorporated into the workflows of stakeholders focusing on research transparency to improve reporting of limitations in clinical studies.
创建时间:
2019-03-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作