A Supervised Approach to Quantifying Sentence Similarity: With Application to Evidence Based Medicine

NIAID Data Ecosystem2026-03-08 收录

下载链接：

https://figshare.com/articles/dataset/_A_Supervised_Approach_to_Quantifying_Sentence_Similarity_With_Application_to_Evidence_Based_Medicine_/1435176

下载链接

链接失效反馈

官方服务：

资源简介：

Following the Evidence Based Medicine (EBM) practice, practitioners make use of the existing evidence to make therapeutic decisions. This evidence, in the form of scientific statements, is usually found in scholarly publications such as randomised control trials and systematic reviews. However, finding such information in the overwhelming amount of published material is particularly challenging. Approaches have been proposed to automatically extract scientific artefacts in EBM using standardised schemas. Our work takes this stream a step forward and looks into consolidating extracted artefacts—i.e., quantifying their degree of similarity based on the assumption that they carry the same rhetorical role. By semantically connecting key statements in the literature of EBM, practitioners are not only able to find available evidence more easily, but also can track the effects of different treatments/outcomes in a number of related studies. We devise a regression model based on a varied set of features and evaluate it both on a general English corpus (the SICK corpus), as well as on an EBM corpus (the NICTA-PIBOSO corpus). Experimental results show that our approach performs on par with the state of the art on the general English and achieves encouraging results on the biomedical text when compared against human judgement.

遵循循证医学（Evidence Based Medicine）的实践规范，临床从业者会依托现有证据制定治疗决策。这类以科学论述形式存在的证据，通常见于学术出版物中，比如随机对照试验（randomised control trials）与系统评价（systematic reviews）。然而，在海量已发表的学术文献中检索到这类信息，颇具挑战。已有研究提出了采用标准化模式（standardised schemas）自动提取循证医学领域科学实体的方法。本研究在此研究方向上更进一步，致力于整合已提取的科学实体——即基于“实体承载相同修辞功能”的假设，量化实体间的相似度。通过对循证医学文献中的核心论述进行语义关联，临床从业者不仅能够更便捷地检索到可用证据，还可以追踪多项相关研究中不同治疗手段与结局的关联效应。本研究设计了一种基于多特征集合的回归模型，并分别在通用英语语料库（SICK corpus）与循证医学语料库（NICTA-PIBOSO corpus）上对该模型进行了评估。实验结果表明，本研究方法在通用英语语料库上的性能可与当前主流技术比肩，而在生物医学文本任务中，与人工标注结果对照后，本方法也取得了令人满意的结果。

创建时间：

2015-06-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集