five

tsantosh7/COVID-19_Annotations

收藏
Hugging Face2022-04-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tsantosh7/COVID-19_Annotations
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc --- Named Entity Recognition for COVID-19 Bio Entities The dataset was taken from https://github.com/davidcampos/covid19-corpus Dataset The dataset was then split into several datasets each one representing one entity. Namely, Disorder, Species, Chemical or Drug, Gene and Protein, Enzyme, Anatomy, Biological Process, Molecular Function, Cellular Component, Pathway and microRNA. Moreover, another dataset is also created with all those aforementioned that are non-overlapping in nature. Dataset Formats The datasets are available in two formats IOB and Spacy's JSONL format. IOB : https://github.com/tsantosh7/COVID-19-Named-Entity-Recognition/tree/master/Datasets/BIO SpaCy JSONL: https://github.com/tsantosh7/COVID-19-Named-Entity-Recognition/tree/master/Datasets/SpaCy
提供机构:
tsantosh7
原始信息汇总

数据集概述

数据集名称

Named Entity Recognition for COVID-19 Bio Entities

数据来源

数据集来源于 https://github.com/davidcampos/covid19-corpus。

数据集内容

数据集被分割为多个子集,每个子集代表一种生物实体,包括:

  • 疾病(Disorder)
  • 物种(Species)
  • 化学或药物(Chemical or Drug)
  • 基因和蛋白质(Gene and Protein)
  • 酶(Enzyme)
  • 解剖结构(Anatomy)
  • 生物过程(Biological Process)
  • 分子功能(Molecular Function)
  • 细胞组件(Cellular Component)
  • 途径(Pathway)
  • 微小RNA(microRNA) 此外,还创建了一个包含上述所有非重叠实体的集合。

数据集格式

数据集提供两种格式:

  • IOB格式:位于 https://github.com/tsantosh7/COVID-19-Named-Entity-Recognition/tree/master/Datasets/BIO
  • Spacy JSONL格式:位于 https://github.com/tsantosh7/COVID-19-Named-Entity-Recognition/tree/master/Datasets/SpaCy

许可证

数据集采用CC许可证。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作