可以前往https://webs.bjidex.com/sys-bsc-home/#/bscConsole/intellectualProperty/infoPublicity?action=1 , 进入‘证书公告’页面后,搜索“新能源汽车风险评分数据集”, 点击数据集合名称,查看这个数据集的详情信息
NCBI Disease Corpus
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/NCBI_Disease_Corpus
下载链接
链接失效反馈官方服务:
资源简介:
“NCBI 疾病语料库在提及和概念级别进行了充分注释,可作为生物医学自然语言处理社区的研究资源。语料库特征 793 PubMed 摘要 6,892 疾病提及 790 个独特的疾病概念 医学主题词 (MeSH®) Online Mendelian Inheritance in Man (OMIM®) 91% 的提及映射到单个疾病概念,分为训练集、开发集和测试集 语料库注释 14 个注释器 每个文档两个注释器(随机配对) 三个注释阶段 检查整个语料库注释的一致性”
The NCBI Disease Corpus is fully annotated at both the mention and concept levels, serving as a research resource for the biomedical natural language processing community.
Corpus Characteristics:
- 793 PubMed abstracts
- 6,892 disease mentions
- 790 unique disease concepts, with annotations referencing Medical Subject Headings (MeSH®) and Online Mendelian Inheritance in Man (OMIM®)
- 91% of mentions are mapped to a single disease concept
- The corpus is divided into training, development, and test sets
Corpus Annotation:
- 14 annotators in total
- Two annotators were randomly paired for each document
- Three annotation stages were conducted
- Annotation consistency across the entire corpus was verified
提供机构:
OpenDataLab
创建时间:
2022-05-07
搜集汇总
数据集介绍

背景与挑战
背景概述
NCBI Disease Corpus是一个生物医学自然语言处理领域的研究资源,包含793篇PubMed摘要中的6,892个标准化疾病提及,采用多阶段严格标注流程。该数据集发布于2012年,提供训练/开发/测试集划分,主要用于疾病命名实体识别任务。
以上内容由遇见数据集搜集并总结生成



