Automatic recognition of self-acknowledged limitations in clinical research literature

DataONE2020-06-24 更新2025-05-03 收录

下载链接：

https://search.dataone.org/view/sha256:09aa316284227753969edf0d3c60fef0510b8e1672de9bd39cfa72a6bcd1effe

下载链接

链接失效反馈

官方服务：

资源简介：

Objective: To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency. Materials and Methods: To develop our recognition methods, we used a set of 8,431 sentences from 1,197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing and inter-annotator agreement was calculated. We cast the recognition problem as a binary classification task, in which we determine whether a given sentence from a publication discusses self-acknowledged limitations or not. We experimented with three methods: a rule-based approach based on document structure, supervised machine learning, and a semi-supervised method that uses self-training to expand the training set in order to improve classification performance. The machine learning algorithms used were logistic regression (LR) and support vector machines (SVM). Results: Annotators had good agreement in labeling limitation sentences...

研究目标：自动识别临床研究论文中作者自述的研究局限性，为提升研究透明度的相关工作提供支撑。材料与方法：为开发本研究的识别方法，我们使用了1197篇PubMed中心（PubMed Central）文章中的8431条句子。其中部分句子经人工标注用于训练与测试，并计算了标注者间一致性。我们将该识别任务建模为二分类任务，即判定论文中的给定句子是否涉及作者自述的研究局限性。我们测试了三种方法：基于文档结构的规则方法、监督式机器学习方法，以及通过自训练扩展训练集以提升分类性能的半监督方法。所采用的机器学习算法包括逻辑回归（logistic regression，LR）与支持向量机（support vector machines，SVM）。结果：标注者在对局限性相关句子进行标注时具备良好的一致性……

创建时间：

2025-04-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集