five

NCBI; BC5CDR; i2b2 2010; HPRD50; AIMed; MedNLI

收藏
ieee-dataport.org2025-01-16 收录
下载链接:
https://ieee-dataport.org/documents/ncbi-bc5cdr-i2b2-2010-hprd50-aimed-mednli
下载链接
链接失效反馈
官方服务:
资源简介:
NCBI: The NCBI dataset is a biomedical corpus containing 793 PubMed abstracts, each manually annotated to include disease mentions and their corresponding concepts, providing a high-quality gold standard for disease name recognition and normalization research.BC5CDR-disease: BioCreative V Chemical-Disease Relation (BC5CDR) is annotated for biomedical named entity recognition and relation extraction task, consisting of 1500 PubMed articles, covering annotations of disease and chemical entities, as well as their interactions. In this paper, we only consider the disease entity of the named entity recognition task.i2b2 2010: The i2b2 2010 dataset was sourced from three distinct medical institutions and was annotated by medical professionals to identify eight types of relations between medical problems and corresponding treatments, i.e., TrIP, TrWP, TrCP, TrAP, TrNAP, PIP, TeRP, TeCP.HPRD50: The HPRD50 dataset is sourced from the HPRD database and used for studying human proteinprotein interactions (PPI). HPRD50 corpus consists of 43 documents annotated by true and false protein-protein interaction (PPI) relation. AIMed: The AImed dataset is developed to evaluate protein name recognition and protein-protein interaction (PPI) extraction. AIMed corpus consists of 225 documents annotated by true and false protein-protein interaction (PPI) relation.MedNLI: The MedNLI is collected from MIMIC-III with a form of premise-hypothesis pairs. And annotated by radiologists, the dataset is graded for entailment, contradiction, or neutrality based on whether the premise entails the hypothesis.

NCBI数据集系由美国国立生物技术信息中心(National Center for Biotechnology Information)所编纂的生物医学语料库,包含793篇PubMed摘要,每一篇摘要均经人工标注,涵盖了疾病提及及其相应的概念,为疾病名称识别与标准化研究提供了高质量的黄金标准。BC5CDR-disease:生物创意挑战V化学-疾病关系(BioCreative V Chemical-Disease Relation,BC5CDR)针对生物医学命名实体识别和关系提取任务进行了标注,共涵盖1500篇PubMed文章,涉及疾病和化学实体的标注及其相互作用。在本研究中,我们仅考虑命名实体识别任务中的疾病实体。i2b2 2010:i2b2 2010数据集来源于三家不同的医疗机构,并由医疗专业人员标注,以识别医疗问题与相应治疗之间的八种关系,即TrIP、TrWP、TrCP、TrAP、TrNAP、PIP、TeRP、TeCP。HPRD50:HPRD50数据集源自人类蛋白质互作数据库(Human Protein-Protein Interaction,HPRD),用于研究人类蛋白质-蛋白质互作(Protein-Protein Interaction,PPI)。HPRD50语料库由43篇文档组成,其中标注了真实的蛋白质-蛋白质互作关系。AIMed:AImed数据集旨在评估蛋白质名称识别和蛋白质-蛋白质互作(PPI)提取。AImed语料库由225篇文档组成,其中标注了真实的蛋白质-蛋白质互作关系。MedNLI:MedNLI数据集来源于MIMIC-III,以前提-假设对的形式收集。该数据集由放射科医生进行标注,根据前提是否蕴含假设对数据集进行了蕴含、矛盾或中立性的等级划分。
提供机构:
IEEE Dataport
二维码
社区交流群
二维码
科研交流群
商业服务