arubenruben/ontonotes5.0-pt
收藏Hugging Face2023-05-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/arubenruben/ontonotes5.0-pt
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: tokens
sequence: string
- name: ner_tags
sequence:
class_label:
names:
'0': O
'1': B-PERSON
'2': I-PERSON
'3': B-NORP
'4': I-NORP
'5': B-FAC
'6': I-FAC
'7': B-ORG
'8': I-ORG
'9': B-GPE
'10': I-GPE
'11': B-LOC
'12': I-LOC
'13': B-PRODUCT
'14': I-PRODUCT
'15': B-DATE
'16': I-DATE
'17': B-TIME
'18': I-TIME
'19': B-PERCENT
'20': I-PERCENT
'21': B-MONEY
'22': I-MONEY
'23': B-QUANTITY
'24': I-QUANTITY
'25': B-ORDINAL
'26': I-ORDINAL
'27': B-CARDINAL
'28': I-CARDINAL
'29': B-EVENT
'30': I-EVENT
'31': B-WORK_OF_ART
'32': I-WORK_OF_ART
'33': B-LAW
'34': I-LAW
'35': B-LANGUAGE
'36': I-LANGUAGE
splits:
- name: train
num_bytes: 16511400
num_examples: 1898
- name: validation
num_bytes: 2417378
num_examples: 279
- name: test
num_bytes: 1564609
num_examples: 163
download_size: 0
dataset_size: 20493387
---
# Dataset Card for "ontonotes5.0-pt"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
---
dataset_info:
数据集信息:
特征:
- 名称:词元(Token),类型为字符串序列
- 名称:命名实体识别(NER)标签,类型为序列,其类别标签命名如下:
'0': O(非实体)
'1': B-PERSON(B-人物实体起始)
'2': I-PERSON(I-人物实体内部)
'3': B-NORP(B-民族/宗教/政治群体实体起始)
'4': I-NORP(I-民族/宗教/政治群体实体内部)
'5': B-FAC(B-设施实体起始)
'6': I-FAC(I-设施实体内部)
'7': B-ORG(B-组织实体起始)
'8': I-ORG(I-组织实体内部)
'9': B-GPE(B-地缘政治实体起始,如国家、城市)
'10': I-GPE(I-地缘政治实体内部)
'11': B-LOC(B-自然地理位置实体起始)
'12': I-LOC(I-自然地理位置实体内部)
'13': B-PRODUCT(B-产品实体起始)
'14': I-PRODUCT(I-产品实体内部)
'15': B-DATE(B-日期实体起始)
'16': I-DATE(I-日期实体内部)
'17': B-TIME(B-时间实体起始)
'18': I-TIME(I-时间实体内部)
'19': B-PERCENT(B-百分比实体起始)
'20': I-PERCENT(I-百分比实体内部)
'21': B-MONEY(B-货币金额实体起始)
'22': I-MONEY(I-货币金额实体内部)
'23': B-QUANTITY(B-数量实体起始)
'24': I-QUANTITY(I-数量实体内部)
'25': B-ORDINAL(B-序数实体起始)
'26': I-ORDINAL(I-序数实体内部)
'27': B-CARDINAL(B-基数实体起始)
'28': I-CARDINAL(I-基数实体内部)
'29': B-EVENT(B-事件实体起始)
'30': I-EVENT(I-事件实体内部)
'31': B-WORK_OF_ART(B-艺术作品实体起始)
'32': I-WORK_OF_ART(I-艺术作品实体内部)
'33': B-LAW(B-法律条文实体起始)
'34': I-LAW(I-法律条文实体内部)
'35': B-LANGUAGE(B-语言实体起始)
'36': I-LANGUAGE(I-语言实体内部)
拆分集:
- 名称:训练集(train),字节大小:16511400,样本数量:1898
- 名称:验证集(validation),字节大小:2417378,样本数量:279
- 名称:测试集(test),字节大小:1564609,样本数量:163
下载大小:0
数据集总大小:20493387
---
# 「ontonotes5.0-pt」数据集卡片
[需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
arubenruben
原始信息汇总
数据集概述
数据集特征
- tokens: 字符串序列
- ner_tags: 序列标签,包含以下类别:
- 0: O
- 1: B-PERSON
- 2: I-PERSON
- 3: B-NORP
- 4: I-NORP
- 5: B-FAC
- 6: I-FAC
- 7: B-ORG
- 8: I-ORG
- 9: B-GPE
- 10: I-GPE
- 11: B-LOC
- 12: I-LOC
- 13: B-PRODUCT
- 14: I-PRODUCT
- 15: B-DATE
- 16: I-DATE
- 17: B-TIME
- 18: I-TIME
- 19: B-PERCENT
- 20: I-PERCENT
- 21: B-MONEY
- 22: I-MONEY
- 23: B-QUANTITY
- 24: I-QUANTITY
- 25: B-ORDINAL
- 26: I-ORDINAL
- 27: B-CARDINAL
- 28: I-CARDINAL
- 29: B-EVENT
- 30: I-EVENT
- 31: B-WORK_OF_ART
- 32: I-WORK_OF_ART
- 33: B-LAW
- 34: I-LAW
- 35: B-LANGUAGE
- 36: I-LANGUAGE
数据集划分
- train:
- 字节数: 16511400
- 示例数: 1898
- validation:
- 字节数: 2417378
- 示例数: 279
- test:
- 字节数: 1564609
- 示例数: 163
数据集大小
- 下载大小: 0
- 数据集总大小: 20493387字节



