five

DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6497283
下载链接
链接失效反馈
官方服务:
资源简介:
Datasets DISTANT-CTO is a weakly-labelled dataset of 'Intervention' and 'Comparator' entity annotated sentences. The dataset was obtained using candidate generation the approach described in "DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low Resource Entity Extraction Using Clinical Trials Literature". distantcto_high_conf.txt    - ds conf 1.0 (full dataset) extraction1_pos_posnegtrail_conf09.txt - ds conf 0.9 (partial dataset) The physio test set is a dataset comprising 153 PICO annotated randomized controlled trial abstracts from Physiotherapy and Rehabilitation. This dataset was used as an additional benchmark to evaluate the generalization power of the weakly annotated dataset and NER model for this sub-domain.   Utility The dataset could be used as an input for training 'Intervention' named-entity recognition (NER) models.   Availability This directory includes extraction1_pos_posnegtrail_conf09.txt - This text data file contains all the weak annotations (source intervention terms mapped onto target sentences) from clinicaltrials.org (CTO) with a confidence score of 0.9 and above. The directory also includes ‘physio_sent_annot2POS_posnegtrail.txt’ – This data file contains manually annotated (Intervention entity) data from the physiotherapy and rehabilitation domain. It follows a roughly similar structure as described in the ‘Description for long targets’ section. (‘Participant’ and ‘Outcome’ annotations are removed from this file)
创建时间:
2022-08-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作