chintagunta85/bc2gm_test
收藏Hugging Face2022-07-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/chintagunta85/bc2gm_test
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- expert-generated
language_creators:
- expert-generated
language:
- en
license:
- unknown
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- token-classification
task_ids:
- named-entity-recognition
paperswithcode_id: null
pretty_name: Bc2GmCorpus
---
# Dataset Card for bc2gm_corpus
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** [Github](https://github.com/spyysalo/bc2gm-corpus/)
- **Repository:** [Github](https://github.com/spyysalo/bc2gm-corpus/)
- **Paper:** [NCBI](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559986/)
- **Leaderboard:**
- **Point of Contact:**
### Dataset Summary
[More Information Needed]
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
[More Information Needed]
## Dataset Structure
### Data Instances
[More Information Needed]
### Data Fields
- `id`: Sentence identifier.
- `tokens`: Array of tokens composing a sentence.
- `ner_tags`: Array of tags, where `0` indicates no disease mentioned, `1` signals the first token of a disease and `2` the subsequent disease tokens.
### Data Splits
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
Thanks to [@mahajandiwakar](https://github.com/mahajandiwakar) for adding this dataset.
提供机构:
chintagunta85
原始信息汇总
数据集概述
- 名称: Bc2GmCorpus
- 语言: 英语(en)
- 许可证: 未知
- 多语言性: 单语
- 大小: 10K<n<100K
- 来源: 原始数据
- 任务类别: 令牌分类
- 任务ID: 命名实体识别
数据集结构
- 数据实例:
id: 句子标识符。tokens: 组成句子的令牌数组。ner_tags: 标签数组,其中0表示未提及疾病,1表示疾病的首个令牌,2表示后续的疾病令牌。
数据集创建
- 注释创建者: 专家生成
- 语言创建者: 专家生成
数据字段
id: 句子标识符。tokens: 组成句子的令牌数组。ner_tags: 标签数组,其中0表示未提及疾病,1表示疾病的首个令牌,2表示后续的疾病令牌。



