conllpp_for_ner
收藏魔搭社区2026-04-25 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/yingxi/conllpp_for_ner
下载链接
链接失效反馈官方服务:
资源简介:
# conllpp命名实体识别数据集
## 数据集概述
conllpp数据集是conll2003命名实体识别数据集的修正版本,其中测试集中5.38%的句子标签通过人工校验进行修正。
为了保持数据集完整,该数据集同时包括了conll2003的训练集、验证集。
### 数据集简介
本数据集包括训练集(14041)、验证集(3250)、测试集(3453),实体类型包括地点(LOC)、混合(MISC)、组织(ORG)、人名(PER)。
### 数据集的格式和结构
数据格式采用conll标准,NER数据包括两列,第一列输入句中的词划分以及最后一列中每个词对应的命名实体类型标签。一个具体case的例子如下:
```
SOCCER NN I-NP O
- : O O
JAPAN NNP I-NP B-LOC
GET VB I-VP O
LUCKY NNP I-NP O
WIN NNP I-NP O
, , O O
CHINA NNP I-NP B-LOC
IN IN I-PP O
SURPRISE DT I-NP O
DEFEAT NN I-NP O
. . O O
```
### Clone with HTTP
```bash
git clone https://www.modelscope.cn/datasets/yingxi/conllpp_for_ner.git
```
# conllpp Named Entity Recognition Dataset
## Dataset Overview
The conllpp dataset is a revised version of the CoNLL2003 Named Entity Recognition (NER) dataset, where 5.38% of sentence labels in the test split were corrected through manual verification.
To preserve dataset completeness, this dataset includes both the training and validation splits from the original CoNLL2003 dataset.
### Dataset Introduction
This dataset comprises three splits: training set (14041 samples), validation set (3250 samples), and test set (3453 samples). The entity types cover Location (LOC), Miscellaneous (MISC), Organization (ORG), and Person (PER).
### Dataset Format and Structure
The dataset adheres to the CoNLL standard format. NER data consists of two columns: the first column contains the tokenized words of the input sentence, and the final column contains the named entity type tags corresponding to each word. An example of a specific data case is as follows:
SOCCER NN I-NP O
- : O O
JAPAN NNP I-NP B-LOC
GET VB I-VP O
LUCKY NNP I-NP O
WIN NNP I-NP O
, , O O
CHINA NNP I-NP B-LOC
IN IN I-PP O
SURPRISE DT I-NP O
DEFEAT NN I-NP O
. . O O
### Clone with HTTP
bash
git clone https://www.modelscope.cn/datasets/yingxi/conllpp_for_ner.git
提供机构:
maas
创建时间:
2023-02-15



