five

stulcrad/CNEC2_0_flat

收藏
Hugging Face2024-05-16 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/stulcrad/CNEC2_0_flat
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - cs dataset_info: features: - name: tokens sequence: string - name: ner_tags sequence: class_label: names: '0': O '1': B-ah '2': I-ah '3': B-at '4': I-at '5': B-az '6': I-az '7': B-g_ '8': I-g_ '9': B-gc '10': I-gc '11': B-gh '12': I-gh '13': B-gl '14': I-gl '15': B-gq '16': I-gq '17': B-gr '18': I-gr '19': B-gs '20': I-gs '21': B-gt '22': I-gt '23': B-gu '24': I-gu '25': B-i_ '26': I-i_ '27': B-ia '28': I-ia '29': B-ic '30': I-ic '31': B-if '32': I-if '33': B-io '34': I-io '35': B-me '36': I-me '37': B-mi '38': I-mi '39': B-mn '40': I-mn '41': B-ms '42': I-ms '43': B-n_ '44': I-n_ '45': B-na '46': I-na '47': B-nb '48': I-nb '49': B-nc '50': I-nc '51': B-ni '52': I-ni '53': B-no '54': I-no '55': B-ns '56': I-ns '57': B-o_ '58': I-o_ '59': B-oa '60': I-oa '61': B-oe '62': I-oe '63': B-om '64': I-om '65': B-op '66': I-op '67': B-or '68': I-or '69': B-p_ '70': I-p_ '71': B-pc '72': I-pc '73': B-pd '74': I-pd '75': B-pf '76': I-pf '77': B-pm '78': I-pm '79': B-pp '80': I-pp '81': B-ps '82': I-ps '83': B-td '84': I-td '85': B-tf '86': I-tf '87': B-th '88': I-th '89': B-tm '90': I-tm '91': B-ty '92': I-ty splits: - name: train num_bytes: 2798586 num_examples: 7193 - name: validation num_bytes: 350253 num_examples: 900 - name: test num_bytes: 352146 num_examples: 899 download_size: 1219405 dataset_size: 3500985 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* ---
提供机构:
stulcrad
原始信息汇总

数据集概述

数据集特征

  • tokens: 字符串序列
  • ner_tags: 标签序列,包含以下类别:
    • 0: O
    • 1: B-ah
    • 2: I-ah
    • 3: B-at
    • 4: I-at
    • 5: B-az
    • 6: I-az
    • 7: B-g_
    • 8: I-g_
    • 9: B-gc
    • 10: I-gc
    • 11: B-gh
    • 12: I-gh
    • 13: B-gl
    • 14: I-gl
    • 15: B-gq
    • 16: I-gq
    • 17: B-gr
    • 18: I-gr
    • 19: B-gs
    • 20: I-gs
    • 21: B-gt
    • 22: I-gt
    • 23: B-gu
    • 24: I-gu
    • 25: B-i_
    • 26: I-i_
    • 27: B-ia
    • 28: I-ia
    • 29: B-ic
    • 30: I-ic
    • 31: B-if
    • 32: I-if
    • 33: B-io
    • 34: I-io
    • 35: B-me
    • 36: I-me
    • 37: B-mi
    • 38: I-mi
    • 39: B-mn
    • 40: I-mn
    • 41: B-ms
    • 42: I-ms
    • 43: B-n_
    • 44: I-n_
    • 45: B-na
    • 46: I-na
    • 47: B-nb
    • 48: I-nb
    • 49: B-nc
    • 50: I-nc
    • 51: B-ni
    • 52: I-ni
    • 53: B-no
    • 54: I-no
    • 55: B-ns
    • 56: I-ns
    • 57: B-o_
    • 58: I-o_
    • 59: B-oa
    • 60: I-oa
    • 61: B-oe
    • 62: I-oe
    • 63: B-om
    • 64: I-om
    • 65: B-op
    • 66: I-op
    • 67: B-or
    • 68: I-or
    • 69: B-p_
    • 70: I-p_
    • 71: B-pc
    • 72: I-pc
    • 73: B-pd
    • 74: I-pd
    • 75: B-pf
    • 76: I-pf
    • 77: B-pm
    • 78: I-pm
    • 79: B-pp
    • 80: I-pp
    • 81: B-ps
    • 82: I-ps
    • 83: B-td
    • 84: I-td
    • 85: B-tf
    • 86: I-tf
    • 87: B-th
    • 88: I-th
    • 89: B-tm
    • 90: I-tm
    • 91: B-ty
    • 92: I-ty

数据集分割

  • train: 7193个样本,占用2798586字节
  • validation: 900个样本,占用350253字节
  • test: 899个样本,占用352146字节

数据集大小

  • 下载大小: 1219405字节
  • 数据集大小: 3500985字节

配置文件

  • default: 包含train、validation和test三个分割的数据文件路径
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作