tsantosh7/COVID-19_Annotations
收藏Hugging Face2022-04-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tsantosh7/COVID-19_Annotations
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc
---
Named Entity Recognition for COVID-19 Bio Entities
The dataset was taken from https://github.com/davidcampos/covid19-corpus
Dataset
The dataset was then split into several datasets each one representing one entity. Namely, Disorder, Species, Chemical or Drug, Gene and Protein, Enzyme, Anatomy, Biological Process, Molecular Function, Cellular Component, Pathway and microRNA. Moreover, another dataset is also created with all those aforementioned that are non-overlapping in nature.
Dataset Formats
The datasets are available in two formats IOB and Spacy's JSONL format.
IOB : https://github.com/tsantosh7/COVID-19-Named-Entity-Recognition/tree/master/Datasets/BIO
SpaCy JSONL: https://github.com/tsantosh7/COVID-19-Named-Entity-Recognition/tree/master/Datasets/SpaCy
提供机构:
tsantosh7
原始信息汇总
数据集概述
数据集名称
Named Entity Recognition for COVID-19 Bio Entities
数据来源
数据集来源于 https://github.com/davidcampos/covid19-corpus。
数据集内容
数据集被分割为多个子集,每个子集代表一种生物实体,包括:
- 疾病(Disorder)
- 物种(Species)
- 化学或药物(Chemical or Drug)
- 基因和蛋白质(Gene and Protein)
- 酶(Enzyme)
- 解剖结构(Anatomy)
- 生物过程(Biological Process)
- 分子功能(Molecular Function)
- 细胞组件(Cellular Component)
- 途径(Pathway)
- 微小RNA(microRNA) 此外,还创建了一个包含上述所有非重叠实体的集合。
数据集格式
数据集提供两种格式:
- IOB格式:位于 https://github.com/tsantosh7/COVID-19-Named-Entity-Recognition/tree/master/Datasets/BIO
- Spacy JSONL格式:位于 https://github.com/tsantosh7/COVID-19-Named-Entity-Recognition/tree/master/Datasets/SpaCy
许可证
数据集采用CC许可证。



