five

conllpp_for_ner

收藏
魔搭社区2026-04-25 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/yingxi/conllpp_for_ner
下载链接
链接失效反馈
官方服务:
资源简介:
# conllpp命名实体识别数据集 ## 数据集概述 conllpp数据集是conll2003命名实体识别数据集的修正版本,其中测试集中5.38%的句子标签通过人工校验进行修正。 为了保持数据集完整,该数据集同时包括了conll2003的训练集、验证集。 ### 数据集简介 本数据集包括训练集(14041)、验证集(3250)、测试集(3453),实体类型包括地点(LOC)、混合(MISC)、组织(ORG)、人名(PER)。 ### 数据集的格式和结构 数据格式采用conll标准,NER数据包括两列,第一列输入句中的词划分以及最后一列中每个词对应的命名实体类型标签。一个具体case的例子如下: ``` SOCCER NN I-NP O - : O O JAPAN NNP I-NP B-LOC GET VB I-VP O LUCKY NNP I-NP O WIN NNP I-NP O , , O O CHINA NNP I-NP B-LOC IN IN I-PP O SURPRISE DT I-NP O DEFEAT NN I-NP O . . O O ``` ### Clone with HTTP ```bash git clone https://www.modelscope.cn/datasets/yingxi/conllpp_for_ner.git ```

# conllpp Named Entity Recognition Dataset ## Dataset Overview The conllpp dataset is a revised version of the CoNLL2003 Named Entity Recognition (NER) dataset, where 5.38% of sentence labels in the test split were corrected through manual verification. To preserve dataset completeness, this dataset includes both the training and validation splits from the original CoNLL2003 dataset. ### Dataset Introduction This dataset comprises three splits: training set (14041 samples), validation set (3250 samples), and test set (3453 samples). The entity types cover Location (LOC), Miscellaneous (MISC), Organization (ORG), and Person (PER). ### Dataset Format and Structure The dataset adheres to the CoNLL standard format. NER data consists of two columns: the first column contains the tokenized words of the input sentence, and the final column contains the named entity type tags corresponding to each word. An example of a specific data case is as follows: SOCCER NN I-NP O - : O O JAPAN NNP I-NP B-LOC GET VB I-VP O LUCKY NNP I-NP O WIN NNP I-NP O , , O O CHINA NNP I-NP B-LOC IN IN I-PP O SURPRISE DT I-NP O DEFEAT NN I-NP O . . O O ### Clone with HTTP bash git clone https://www.modelscope.cn/datasets/yingxi/conllpp_for_ner.git
提供机构:
maas
创建时间:
2023-02-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作