stulcrad/CNEC2_0_flat
收藏Hugging Face2024-05-16 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/stulcrad/CNEC2_0_flat
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- cs
dataset_info:
features:
- name: tokens
sequence: string
- name: ner_tags
sequence:
class_label:
names:
'0': O
'1': B-ah
'2': I-ah
'3': B-at
'4': I-at
'5': B-az
'6': I-az
'7': B-g_
'8': I-g_
'9': B-gc
'10': I-gc
'11': B-gh
'12': I-gh
'13': B-gl
'14': I-gl
'15': B-gq
'16': I-gq
'17': B-gr
'18': I-gr
'19': B-gs
'20': I-gs
'21': B-gt
'22': I-gt
'23': B-gu
'24': I-gu
'25': B-i_
'26': I-i_
'27': B-ia
'28': I-ia
'29': B-ic
'30': I-ic
'31': B-if
'32': I-if
'33': B-io
'34': I-io
'35': B-me
'36': I-me
'37': B-mi
'38': I-mi
'39': B-mn
'40': I-mn
'41': B-ms
'42': I-ms
'43': B-n_
'44': I-n_
'45': B-na
'46': I-na
'47': B-nb
'48': I-nb
'49': B-nc
'50': I-nc
'51': B-ni
'52': I-ni
'53': B-no
'54': I-no
'55': B-ns
'56': I-ns
'57': B-o_
'58': I-o_
'59': B-oa
'60': I-oa
'61': B-oe
'62': I-oe
'63': B-om
'64': I-om
'65': B-op
'66': I-op
'67': B-or
'68': I-or
'69': B-p_
'70': I-p_
'71': B-pc
'72': I-pc
'73': B-pd
'74': I-pd
'75': B-pf
'76': I-pf
'77': B-pm
'78': I-pm
'79': B-pp
'80': I-pp
'81': B-ps
'82': I-ps
'83': B-td
'84': I-td
'85': B-tf
'86': I-tf
'87': B-th
'88': I-th
'89': B-tm
'90': I-tm
'91': B-ty
'92': I-ty
splits:
- name: train
num_bytes: 2798586
num_examples: 7193
- name: validation
num_bytes: 350253
num_examples: 900
- name: test
num_bytes: 352146
num_examples: 899
download_size: 1219405
dataset_size: 3500985
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
---
提供机构:
stulcrad
原始信息汇总
数据集概述
数据集特征
- tokens: 字符串序列
- ner_tags: 标签序列,包含以下类别:
- 0: O
- 1: B-ah
- 2: I-ah
- 3: B-at
- 4: I-at
- 5: B-az
- 6: I-az
- 7: B-g_
- 8: I-g_
- 9: B-gc
- 10: I-gc
- 11: B-gh
- 12: I-gh
- 13: B-gl
- 14: I-gl
- 15: B-gq
- 16: I-gq
- 17: B-gr
- 18: I-gr
- 19: B-gs
- 20: I-gs
- 21: B-gt
- 22: I-gt
- 23: B-gu
- 24: I-gu
- 25: B-i_
- 26: I-i_
- 27: B-ia
- 28: I-ia
- 29: B-ic
- 30: I-ic
- 31: B-if
- 32: I-if
- 33: B-io
- 34: I-io
- 35: B-me
- 36: I-me
- 37: B-mi
- 38: I-mi
- 39: B-mn
- 40: I-mn
- 41: B-ms
- 42: I-ms
- 43: B-n_
- 44: I-n_
- 45: B-na
- 46: I-na
- 47: B-nb
- 48: I-nb
- 49: B-nc
- 50: I-nc
- 51: B-ni
- 52: I-ni
- 53: B-no
- 54: I-no
- 55: B-ns
- 56: I-ns
- 57: B-o_
- 58: I-o_
- 59: B-oa
- 60: I-oa
- 61: B-oe
- 62: I-oe
- 63: B-om
- 64: I-om
- 65: B-op
- 66: I-op
- 67: B-or
- 68: I-or
- 69: B-p_
- 70: I-p_
- 71: B-pc
- 72: I-pc
- 73: B-pd
- 74: I-pd
- 75: B-pf
- 76: I-pf
- 77: B-pm
- 78: I-pm
- 79: B-pp
- 80: I-pp
- 81: B-ps
- 82: I-ps
- 83: B-td
- 84: I-td
- 85: B-tf
- 86: I-tf
- 87: B-th
- 88: I-th
- 89: B-tm
- 90: I-tm
- 91: B-ty
- 92: I-ty
数据集分割
- train: 7193个样本,占用2798586字节
- validation: 900个样本,占用350253字节
- test: 899个样本,占用352146字节
数据集大小
- 下载大小: 1219405字节
- 数据集大小: 3500985字节
配置文件
- default: 包含train、validation和test三个分割的数据文件路径



