X-qi/msra_ner
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/X-qi/msra_ner
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- crowdsourced
language_creators:
- found
language:
- zh
license:
- unknown
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- token-classification
task_ids:
- named-entity-recognition
pretty_name: MSRA NER
dataset_info:
features:
- name: id
dtype: string
- name: tokens
sequence: string
- name: ner_tags
sequence:
class_label:
names:
'0': O
'1': B-PER
'2': I-PER
'3': B-ORG
'4': I-ORG
'5': B-LOC
'6': I-LOC
config_name: msra_ner
splits:
- name: train
num_bytes: 33323074
num_examples: 45001
- name: test
num_bytes: 2642934
num_examples: 3443
download_size: 15156606
dataset_size: 35966008
train-eval-index:
- config: msra_ner
task: token-classification
task_id: entity_extraction
splits:
train_split: train
eval_split: test
col_mapping:
tokens: tokens
ner_tags: tags
metrics:
- type: seqeval
name: seqeval
---
# Dataset Card for MSRA NER
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** [Github](https://github.com/OYE93/Chinese-NLP-Corpus/tree/master/NER/MSRA)
- **Repository:** [Github](https://github.com/OYE93/Chinese-NLP-Corpus)
- **Paper:**
- **Leaderboard:**
- **Point of Contact:**
### Dataset Summary
[More Information Needed]
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
[More Information Needed]
## Dataset Structure
### Data Instances
[More Information Needed]
### Data Fields
[More Information Needed]
### Data Splits
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
Thanks to [@JetRunner](https://github.com/JetRunner) for adding this dataset.
annotations_creators:
- 众包
language_creators:
- 现成采集
language:
- 中文(zh)
license:
- 未知
multilinguality:
- 单语言
size_categories:
- 10000 < 样本量 < 100000
source_datasets:
- 原始数据集
task_categories:
- Token分类(Token-classification)
task_ids:
- 命名实体识别(Named Entity Recognition)
pretty_name: MSRA NER(MSRA 命名实体识别数据集)
dataset_info:
features:
- name: id
dtype: 字符串(string)
- name: tokens
dtype: 字符串序列(sequence<string>)
- name: ner_tags
sequence:
类别标签:
标签映射:
'0': 非实体(O)
'1': 人物实体开头(B-PER)
'2': 人物实体内部(I-PER)
'3': 机构实体开头(B-ORG)
'4': 机构实体内部(I-ORG)
'5': 地点实体开头(B-LOC)
'6': 地点实体内部(I-LOC)
config_name: msra_ner
splits:
- name: 训练集(train)
字节大小: 33323074
样本数: 45001
- name: 测试集(test)
字节大小: 2642934
样本数: 3443
download_size: 15156606
dataset_size: 35966008
train-eval-index:
- config: msra_ner
task: Token分类(Token-classification)
task_id: 实体抽取
splits:
train_split: 训练集
eval_split: 测试集
列映射:
tokens: tokens
ner_tags: 标签(tags)
指标:
- 类型: seqeval
名称: seqeval
# MSRA 命名实体识别数据集卡片(MSRA NER)
## 目录
- [数据集描述](#dataset-description)
- [数据集摘要](#dataset-summary)
- [支持任务与公开榜单](#supported-tasks-and-leaderboards)
- [语言](#languages)
- [数据集结构](#dataset-structure)
- [数据实例](#data-instances)
- [数据字段](#data-fields)
- [数据拆分](#data-splits)
- [数据集构建](#dataset-creation)
- [构建初衷](#curation-rationale)
- [源数据](#source-data)
- [标注流程](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据集使用注意事项](#considerations-for-using-the-data)
- [数据集的社会影响](#social-impact-of-dataset)
- [偏差讨论](#discussion-of-biases)
- [其他已知局限性](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集策展人](#dataset-curators)
- [许可信息](#licensing-information)
- [引用信息](#citation-information)
- [贡献者](#contributions)
## 数据集描述
- **"主页"**:[GitHub](https://github.com/OYE93/Chinese-NLP-Corpus/tree/master/NER/MSRA)
- **"代码仓库"**:[GitHub](https://github.com/OYE93/Chinese-NLP-Corpus)
- **"相关论文"**:无
- **"公开榜单"**:无
- **"联系人"**:无
### 数据集摘要
【需补充更多信息】
### 支持任务与公开榜单
【需补充更多信息】
### 语言
【需补充更多信息】
## 数据集结构
### 数据实例
【需补充更多信息】
### 数据字段
【需补充更多信息】
### 数据拆分
【需补充更多信息】
## 数据集构建
### 构建初衷
【需补充更多信息】
### 源数据
#### 初始数据采集与标准化
【需补充更多信息】
#### 语言数据生产者是谁?
【需补充更多信息】
### 标注流程
#### 标注过程
【需补充更多信息】
#### 标注人员是谁?
【需补充更多信息】
### 个人与敏感信息
【需补充更多信息】
## 数据集使用注意事项
### 数据集的社会影响
【需补充更多信息】
### 偏差讨论
【需补充更多信息】
### 其他已知局限性
【需补充更多信息】
## 附加信息
### 数据集策展人
【需补充更多信息】
### 许可信息
【需补充更多信息】
### 引用信息
【需补充更多信息】
### 贡献者
感谢[@JetRunner](https://github.com/JetRunner) 为本数据集的收录提供贡献。
提供机构:
X-qi



