five

ndamulelonemakh/za_chunk

收藏
Hugging Face2024-06-04 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/ndamulelonemakh/za_chunk
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: nbl features: - name: tokens sequence: string - name: tags sequence: class_label: names: '0': A '1': ADJ00 '2': ADJ01 '3': ADJ02 '4': ADJ03 '5': ADJ04 '6': ADJ05 '7': ADJ06 '8': ADJ07 '9': ADJ08 '10': ADJ09 '11': ADV '12': CONJ '13': COP '14': DEM '15': DEM01 '16': DEM02 '17': DEM03 '18': IDEOPH '19': INT '20': LOC '21': M '22': N '23': N00 '24': N01 '25': N02 '26': N03 '27': N04 '28': N05 '29': N06 '30': N07 '31': N08 '32': N09 '33': N10 '34': N11 '35': N14 '36': NUM01 '37': NUM03 '38': NUM04 '39': NUM05 '40': NUM09 '41': NUM15 '42': P '43': POS '44': POS01 '45': POS02 '46': POS03 '47': POS04 '48': POS05 '49': POS06 '50': POS07 '51': POS08 '52': POS09 '53': POS10 '54': POS14 '55': POS15 '56': PRES '57': PRN '58': PRON01 '59': PRON02 '60': PRON03 '61': PRON04 '62': PRON05 '63': PRON07 '64': PRON09 '65': PRON10 '66': PRON15 '67': PUNCT '68': QUANT01 '69': QUANT02 '70': QUANT03 '71': QUANT04 '72': QUANT05 '73': QUANT06 '74': QUANT07 '75': QUANT08 '76': QUANT09 '77': QUANT10 '78': QUANT11 '79': QUANT14 '80': QUANT15 '81': REL '82': REL01 '83': REL02 '84': REL03 '85': REL04 '86': REL05 '87': REL06 '88': REL07 '89': REL08 '90': REL09 '91': REL10 '92': REL14 '93': REL15 '94': V - name: lang dtype: string splits: - name: train num_bytes: 678150 num_examples: 1658 - name: validation num_bytes: 90955 num_examples: 207 - name: test num_bytes: 93000 num_examples: 208 download_size: 304867 dataset_size: 862105 - config_name: nso features: - name: tokens sequence: string - name: tags sequence: class_label: names: '0': ADJ01 '1': ADJ02 '2': ADJ03 '3': ADJ04 '4': ADJ05 '5': ADJ06 '6': ADJ07 '7': ADJ08 '8': ADJ09 '9': ADJ10 '10': ADJ14 '11': ADJLOC '12': ADV '13': CD01 '14': CD02 '15': CD03 '16': CD04 '17': CD05 '18': CD06 '19': CD07 '20': CD08 '21': CD09 '22': CD10 '23': CD14 '24': CD15 '25': CD16 '26': CD18 '27': CO01 '28': CO02 '29': CO03 '30': CO04 '31': CO05 '32': CO06 '33': CO07 '34': CO08 '35': CO09 '36': CO10 '37': CO14 '38': CONJ '39': COPERS '40': CPOSS01 '41': CPOSS02 '42': CPOSS03 '43': CPOSS04 '44': CPOSS05 '45': CPOSS06 '46': CPOSS07 '47': CPOSS08 '48': CPOSS09 '49': CPOSS10 '50': CPOSS14 '51': CPOSS15 '52': CPOSS17 '53': CPOSSPERS '54': CS01 '55': CS02 '56': CS03 '57': CS04 '58': CS05 '59': CS06 '60': CS07 '61': CS08 '62': CS09 '63': CS10 '64': CS14 '65': CS15 '66': CSINDEF '67': CSLOC '68': CSNEUT '69': CSPERS '70': ENUM '71': IDEO '72': INF '73': INT '74': MORPHFUT '75': MORPHNEG '76': MORPHPER '77': MORPHPOT '78': MORPHPRES '79': N01 '80': N01a '81': N02 '82': N02b '83': N03 '84': N04 '85': N05 '86': N06 '87': N07 '88': N08 '89': N09 '90': N10 '91': N14 '92': N16 '93': N17 '94': N18 '95': NLOC '96': NOLABEL '97': PART '98': PARTQUE '99': PROEMP01 '100': PROEMP02 '101': PROEMP03 '102': PROEMP04 '103': PROEMP05 '104': PROEMP06 '105': PROEMP07 '106': PROEMP08 '107': PROEMP09 '108': PROEMP10 '109': PROEMP14 '110': PROEMPLOC '111': PROEMPPERS '112': PROPOSS02 '113': PROPOSS03 '114': PROPOSS04 '115': PROPOSS05 '116': PROPOSS06 '117': PROPOSS07 '118': PROPOSS08 '119': PROPOSS09 '120': PROPOSS10 '121': PROPOSS14 '122': PROPOSSPERS '123': PROQUANT04 '124': PROQUANT10 '125': PROQUANT14 '126': PROQUANTLOC '127': QUE '128': RO '129': RS '130': RV '131': V '132': VAUX '133': VCOP '134': VS '135': ZE '136': ZM '137': ZPL '138': ZPR - name: lang dtype: string splits: - name: train num_bytes: 917356 num_examples: 1544 - name: validation num_bytes: 124533 num_examples: 193 - name: test num_bytes: 126562 num_examples: 194 download_size: 275384 dataset_size: 1168451 - config_name: sot features: - name: tokens sequence: string - name: tags sequence: class_label: names: '0': ADJ01 '1': ADJ02 '2': ADJ03 '3': ADJ04 '4': ADJ05 '5': ADJ06 '6': ADJ07 '7': ADJ08 '8': ADJ09 '9': ADJ10 '10': ADJ14 '11': ADJ15 '12': ADJC01 '13': ADJC02 '14': ADJC03 '15': ADJC04 '16': ADJC05 '17': ADJC06 '18': ADJC07 '19': ADJC08 '20': ADJC09 '21': ADJC10 '22': ADJC14 '23': ADJC15 '24': ADJLOC '25': ADV '26': CCOP07 '27': CCOP09 '28': CCOP10 '29': CCOPPERS '30': CD01 '31': CD02 '32': CD03 '33': CD04 '34': CD05 '35': CD06 '36': CD07 '37': CD08 '38': CD09 '39': CD10 '40': CD14 '41': CD15 '42': CD17 '43': CD18 '44': CDLOC '45': CN15 '46': CO01 '47': CO02 '48': CO03 '49': CO04 '50': CO05 '51': CO06 '52': CO07 '53': CO08 '54': CO09 '55': CO10 '56': CO14 '57': CO15 '58': COLOC '59': CONJ '60': COPERS '61': CPOSS01 '62': CPOSS02 '63': CPOSS03 '64': CPOSS04 '65': CPOSS05 '66': CPOSS06 '67': CPOSS07 '68': CPOSS08 '69': CPOSS09 '70': CPOSS10 '71': CPOSS14 '72': CPOSS15 '73': CPOSSLOC '74': CS01 '75': CS02 '76': CS03 '77': CS04 '78': CS05 '79': CS06 '80': CS07 '81': CS08 '82': CS09 '83': CS10 '84': CS14 '85': CS15 '86': CSINDEF '87': CSLOC '88': CSNEUT '89': CSPERS '90': ENUM '91': IDEO '92': INT '93': MNEG '94': MORPHFUT '95': MORPHPOT '96': MORPHPRES '97': MORPHPROG '98': N01 '99': N01a '100': N02 '101': N02b '102': N03 '103': N04 '104': N05 '105': N06 '106': N07 '107': N08 '108': N09 '109': N10 '110': N14 '111': N15 '112': N16 '113': N18 '114': NLOC '115': NPP '116': PART '117': PROEMP01 '118': PROEMP02 '119': PROEMP03 '120': PROEMP04 '121': PROEMP05 '122': PROEMP06 '123': PROEMP07 '124': PROEMP08 '125': PROEMP09 '126': PROEMP10 '127': PROEMP14 '128': PROEMP15 '129': PROEMPLOC '130': PROEMPPERS '131': PROPOSS02 '132': PROPOSS03 '133': PROPOSS05 '134': PROPOSS06 '135': PROPOSS07 '136': PROPOSS08 '137': PROPOSS09 '138': PROPOSS10 '139': PROPOSS14 '140': PROPOSS15 '141': PROPOSSLOC '142': PROPOSSPERS '143': PROQUANT01 '144': PROQUANT02 '145': PROQUANT03 '146': PROQUANT04 '147': PROQUANT05 '148': PROQUANT06 '149': PROQUANT07 '150': PROQUANT08 '151': PROQUANT09 '152': PROQUANT10 '153': PROQUANT14 '154': PROQUANTLOC '155': PRPOSSPERS '156': QUE '157': RA '158': RF '159': RK '160': RL '161': RO '162': RS '163': RSF '164': RV '165': V '166': VAUX '167': VCOP '168': ZE '169': ZM '170': ZPL '171': ZPR - name: lang dtype: string splits: - name: train num_bytes: 922621 num_examples: 1533 - name: validation num_bytes: 125323 num_examples: 191 - name: test num_bytes: 113230 num_examples: 193 download_size: 280169 dataset_size: 1161174 - config_name: ssw features: - name: tokens sequence: string - name: tags sequence: class_label: names: '0': A '1': ADJ00 '2': ADJ01 '3': ADJ02 '4': ADJ03 '5': ADJ04 '6': ADJ05 '7': ADJ06 '8': ADJ07 '9': ADJ08 '10': ADJ09 '11': ADV '12': CONJ '13': COP '14': DEM '15': DEM01 '16': DEM02 '17': DEM03 '18': IDEOPH '19': INT '20': LOC '21': M '22': N '23': N00 '24': N01 '25': N02 '26': N03 '27': N04 '28': N05 '29': N06 '30': N07 '31': N08 '32': N09 '33': N11 '34': N14 '35': NUM05 '36': NUM09 '37': NUM15 '38': P '39': POS '40': POS01 '41': POS02 '42': POS03 '43': POS04 '44': POS05 '45': POS06 '46': POS07 '47': POS09 '48': POS11 '49': POS14 '50': POS15 '51': PRES '52': PRN '53': PRON01 '54': PRON02 '55': PRON03 '56': PRON04 '57': PRON05 '58': PRON06 '59': PRON07 '60': PRON09 '61': PRON11 '62': PRON14 '63': PRON15 '64': PUNCT '65': QUANT01 '66': QUANT02 '67': QUANT03 '68': QUANT04 '69': QUANT05 '70': QUANT06 '71': QUANT07 '72': QUANT08 '73': QUANT09 '74': QUANT10 '75': QUANT11 '76': QUANT15 '77': REL '78': REL01 '79': REL02 '80': REL03 '81': REL04 '82': REL05 '83': REL06 '84': REL07 '85': REL08 '86': REL09 '87': REL10 '88': REL11 '89': REL14 '90': REL15 '91': V - name: lang dtype: string splits: - name: train num_bytes: 714304 num_examples: 1619 - name: validation num_bytes: 88181 num_examples: 202 - name: test num_bytes: 79673 num_examples: 203 download_size: 306483 dataset_size: 882158 - config_name: tsn features: - name: tokens sequence: string - name: tags sequence: class_label: names: '0': ADJ01 '1': ADJ02 '2': ADJ03 '3': ADJ04 '4': ADJ05 '5': ADJ06 '6': ADJ07 '7': ADJ08 '8': ADJ09 '9': ADJ10 '10': ADJ14 '11': ADJLOC '12': ADV '13': CD01 '14': CD02 '15': CD03 '16': CD04 '17': CD05 '18': CD06 '19': CD07 '20': CD08 '21': CD09 '22': CD10 '23': CD11 '24': CD14 '25': CD15 '26': CD16 '27': CDLOC '28': CN15 '29': CO01 '30': CO02 '31': CO04 '32': CO05 '33': CO06 '34': CO07 '35': CO08 '36': CO09 '37': CO10 '38': CO14 '39': CO17 '40': CONJ '41': COPERS '42': CPOSS01 '43': CPOSS02 '44': CPOSS03 '45': CPOSS04 '46': CPOSS05 '47': CPOSS06 '48': CPOSS07 '49': CPOSS08 '50': CPOSS09 '51': CPOSS10 '52': CPOSS11 '53': CPOSS14 '54': CPOSS15 '55': CPOSS17 '56': CPOSS18 '57': CPOSSLOC '58': CS01 '59': CS02 '60': CS03 '61': CS04 '62': CS05 '63': CS06 '64': CS07 '65': CS08 '66': CS09 '67': CS10 '68': CS11 '69': CS14 '70': CS15 '71': CSINDEF '72': CSLOC '73': CSNEUT '74': CSPERS '75': ENUM '76': INT '77': MNEG '78': N01 '79': N01a '80': N02 '81': N02b '82': N03 '83': N04 '84': N05 '85': N06 '86': N07 '87': N08 '88': N09 '89': N10 '90': N11 '91': N14 '92': N15 '93': N17 '94': N18 '95': NLOC '96': NPP '97': PART '98': PROEMP01 '99': PROEMP02 '100': PROEMP03 '101': PROEMP04 '102': PROEMP05 '103': PROEMP06 '104': PROEMP07 '105': PROEMP09 '106': PROEMP10 '107': PROEMP14 '108': PROEMPLOC '109': PROEMPPERS '110': PROPOSS02 '111': PROPOSSPERS '112': PROQUANT01 '113': PROQUANT02 '114': PROQUANT04 '115': PROQUANT06 '116': PROQUANT07 '117': PROQUANT09 '118': PROQUANT10 '119': PROQUANT11 '120': PROQUANT14 '121': PROQUANT15 '122': PROQUANTLOC '123': PROQUANTPERS '124': QUE '125': RA '126': RF '127': RK '128': RL '129': RS '130': RV '131': TENSE '132': V '133': VAUX '134': VCOP '135': ZE '136': ZM '137': ZPL '138': ZPR - name: lang dtype: string splits: - name: train num_bytes: 922598 num_examples: 1593 - name: validation num_bytes: 117207 num_examples: 199 - name: test num_bytes: 120921 num_examples: 200 download_size: 284279 dataset_size: 1160726 - config_name: tso features: - name: tokens sequence: string - name: tags sequence: class_label: names: '0': ADJ01 '1': ADJ02 '2': ADJ03 '3': ADJ04 '4': ADJ05 '5': ADJ06 '6': ADJ07 '7': ADJ08 '8': ADJ09 '9': ADJ10 '10': ADJ11 '11': ADJ14 '12': ADV '13': ASP '14': CCOP05 '15': CCOP06 '16': CCOP14 '17': CD01 '18': CD02 '19': CD03 '20': CD04 '21': CD05 '22': CD06 '23': CD07 '24': CD08 '25': CD09 '26': CD10 '27': CD14 '28': CD15 '29': CDLOC '30': CN15 '31': CO01 '32': CO02 '33': CO03 '34': CO04 '35': CO05 '36': CO06 '37': CO07 '38': CO08 '39': CO09 '40': CO10 '41': CO14 '42': CO15 '43': CONJ '44': COPERS '45': CPOSS01 '46': CPOSS02 '47': CPOSS03 '48': CPOSS04 '49': CPOSS05 '50': CPOSS06 '51': CPOSS07 '52': CPOSS08 '53': CPOSS09 '54': CPOSS10 '55': CPOSS14 '56': CPOSS15 '57': CPOSS17 '58': CPOSSLOC '59': CS01 '60': CS02 '61': CS03 '62': CS04 '63': CS05 '64': CS06 '65': CS07 '66': CS08 '67': CS09 '68': CS10 '69': CS14 '70': CS15 '71': CSLOC '72': CSNEUT '73': CSPERS '74': ENUM '75': IDEO '76': INT '77': MNEG '78': N01 '79': N01a '80': N02 '81': N03 '82': N04 '83': N05 '84': N06 '85': N07 '86': N08 '87': N09 '88': N10 '89': N11 '90': N14 '91': N16 '92': N18 '93': NLOC '94': PART '95': PROEMP01 '96': PROEMP02 '97': PROEMP03 '98': PROEMP04 '99': PROEMP05 '100': PROEMP06 '101': PROEMP07 '102': PROEMP08 '103': PROEMP09 '104': PROEMP10 '105': PROEMP14 '106': PROEMP15 '107': PROEMPLOC '108': PROEMPPERS '109': PROPOSS02 '110': PROPOSS08 '111': PROPOSSPERS '112': PROQUANT01 '113': PROQUANT02 '114': PROQUANT03 '115': PROQUANT04 '116': PROQUANT05 '117': PROQUANT06 '118': PROQUANT07 '119': PROQUANT08 '120': PROQUANT09 '121': PROQUANT10 '122': PROQUANT11 '123': PROQUANT14 '124': PROQUANT15 '125': PROQUANTLOC '126': QUE '127': RA '128': RK '129': RL '130': RO '131': RPF '132': RS '133': RV '134': RWD '135': TENSE '136': V '137': VAUX '138': VCOP '139': ZE '140': ZM '141': ZPL '142': ZPR - name: lang dtype: string splits: - name: train num_bytes: 944279 num_examples: 1561 - name: validation num_bytes: 109624 num_examples: 195 - name: test num_bytes: 125178 num_examples: 196 download_size: 280301 dataset_size: 1179081 - config_name: ven features: - name: tokens sequence: string - name: tags sequence: class_label: names: '0': ADJ01 '1': ADJ02 '2': ADJ03 '3': ADJ04 '4': ADJ05 '5': ADJ06 '6': ADJ07 '7': ADJ08 '8': ADJ09 '9': ADJ10 '10': ADJ11 '11': ADJ14 '12': ADJ15 '13': ADJ16 '14': ADJ17 '15': ADJLOC '16': ADV '17': ASP '18': CCOP02 '19': CCOP05 '20': CCOP06 '21': CCOP07 '22': CCOP08 '23': CCOP09 '24': CCOP10 '25': CD01 '26': CD02 '27': CD03 '28': CD04 '29': CD05 '30': CD06 '31': CD07 '32': CD08 '33': CD09 '34': CD10 '35': CD14 '36': CD15 '37': CD16 '38': CD17 '39': CD20 '40': CDLOC '41': CN15 '42': CO01 '43': CO02 '44': CO04 '45': CO05 '46': CO06 '47': CO07 '48': CO08 '49': CO09 '50': CO10 '51': CONJ '52': CP15 '53': CPOSS01 '54': CPOSS02 '55': CPOSS03 '56': CPOSS04 '57': CPOSS05 '58': CPOSS06 '59': CPOSS07 '60': CPOSS08 '61': CPOSS09 '62': CPOSS10 '63': CPOSS11 '64': CPOSS17 '65': CPOSS20 '66': CPOSSLOC '67': CREL01 '68': CREL02 '69': CREL03 '70': CREL04 '71': CREL05 '72': CREL06 '73': CREL07 '74': CREL08 '75': CREL09 '76': CREL10 '77': CREL11 '78': CREL14 '79': CREL15 '80': CREL16 '81': CREL17 '82': CREL20 '83': CS01 '84': CS02 '85': CS03 '86': CS04 '87': CS05 '88': CS06 '89': CS07 '90': CS08 '91': CS09 '92': CS10 '93': CS11 '94': CS14 '95': CS15 '96': CS16 '97': CS17 '98': CS18 '99': CS20 '100': CSPERS '101': ENUM '102': IDEO '103': INTER '104': MNEG '105': MORPHFUT '106': MORPHNEG '107': N01 '108': N01a '109': N02 '110': N02b '111': N03 '112': N04 '113': N05 '114': N06 '115': N07 '116': N08 '117': N09 '118': N10 '119': N11 '120': N14 '121': N16 '122': N18 '123': N20 '124': NLOC '125': NPP '126': PART '127': PARTCOPNEG '128': PARTQUE '129': PREPOS '130': PROABS01 '131': PROABS02 '132': PROABS03 '133': PROABS04 '134': PROABS06 '135': PROABS07 '136': PROABS08 '137': PROABS09 '138': PROABS10 '139': PROABS14 '140': PROPOSS01 '141': PROPOSS02 '142': PROPOSS03 '143': PROPOSS04 '144': PROPOSS05 '145': PROPOSS06 '146': PROPOSS07 '147': PROPOSS08 '148': PROPOSS09 '149': PROPOSS10 '150': PROPOSS11 '151': PROPOSS14 '152': PROPOSS15 '153': PROPOSS16 '154': PROPOSS17 '155': PROPOSS20 '156': PROPOSSLOC '157': PROPOSSPERS '158': PROPRESREL01 '159': PROPRESREL03 '160': PROPRESREL06 '161': PROPRESREL09 '162': PROPRESREL14 '163': PROPRESREL15 '164': PROPRESREL16 '165': PROPRESREL17 '166': PROPSTREL01 '167': PROPSTREL02 '168': PROPSTREL03 '169': PROPSTREL05 '170': PROPSTREL06 '171': PROPSTREL07 '172': PROPSTREL08 '173': PROPSTREL09 '174': PROPSTREL10 '175': PROPSTREL11 '176': PROPSTREL16 '177': PROPSTREL17 '178': PROQUANT01 '179': PROQUANT02 '180': PROQUANT03 '181': PROQUANT05 '182': PROQUANT06 '183': PROQUANT07 '184': PROQUANT08 '185': PROQUANT09 '186': PROQUANT10 '187': PROQUANT11 '188': PROQUANT14 '189': PROQUANT16 '190': PROQUANTLOC '191': PROQUANTPERS '192': PROREFL01 '193': PROREFL02 '194': PROREFLPERS '195': PROREL01 '196': PROREL02 '197': PROREL03 '198': PROREL04 '199': PROREL05 '200': PROREL06 '201': PROREL07 '202': PROREL08 '203': PROREL09 '204': PROREL10 '205': PROREL11 '206': PROREL14 '207': PROREL16 '208': PROREL17 '209': PROREL20 '210': QUE '211': RK '212': RS '213': RV '214': TENSE '215': V '216': VAUX '217': VBPREF '218': VCOP '219': ZE '220': ZM '221': ZPL '222': ZPR - name: lang dtype: string splits: - name: train num_bytes: 908224 num_examples: 1680 - name: validation num_bytes: 112219 num_examples: 210 - name: test num_bytes: 119436 num_examples: 210 download_size: 275219 dataset_size: 1139879 - config_name: xho features: - name: tokens sequence: string - name: tags sequence: class_label: names: '0': A '1': ADJ00 '2': ADJ01 '3': ADJ02 '4': ADJ04 '5': ADJ05 '6': ADJ06 '7': ADJ07 '8': ADJ08 '9': ADJ09 '10': ADV '11': CONJ '12': COP '13': DEM '14': DEM01 '15': DEM02 '16': DEM03 '17': IDEOPH '18': INT '19': LOC '20': M '21': N '22': N00 '23': N01 '24': N02 '25': N03 '26': N04 '27': N05 '28': N06 '29': N07 '30': N08 '31': N09 '32': N10 '33': N11 '34': N14 '35': P '36': POS '37': POS01 '38': POS04 '39': POS05 '40': POS06 '41': POS07 '42': POS10 '43': POS11 '44': POS14 '45': POS15 '46': PRES '47': PRN '48': PRON01 '49': PRON02 '50': PRON03 '51': PRON04 '52': PRON07 '53': PRON10 '54': PRON14 '55': PUNCT '56': QUANT01 '57': QUANT04 '58': QUANT05 '59': QUANT06 '60': QUANT10 '61': QUANT11 '62': QUANT14 '63': QUANT15 '64': REL '65': REL01 '66': REL02 '67': REL04 '68': REL05 '69': REL06 '70': REL10 '71': REL11 '72': REL14 '73': REL15 '74': V - name: lang dtype: string splits: - name: train num_bytes: 736627 num_examples: 1610 - name: validation num_bytes: 97129 num_examples: 201 - name: test num_bytes: 92764 num_examples: 202 download_size: 327155 dataset_size: 926520 - config_name: zul features: - name: tokens sequence: string - name: tags sequence: class_label: names: '0': ADJ '1': ADJ00 '2': ADJ01 '3': ADJ02 '4': ADJ03 '5': ADJ04 '6': ADJ05 '7': ADJ06 '8': ADJ07 '9': ADJ08 '10': ADJ09 '11': ADV '12': CONJ '13': COP '14': DEM '15': DEM01 '16': DEM02 '17': DEM03 '18': IDEOPH '19': INT '20': LOC '21': M '22': N '23': N00 '24': N01 '25': N02 '26': N03 '27': N04 '28': N05 '29': N06 '30': N07 '31': N08 '32': N09 '33': N10 '34': N11 '35': N14 '36': NUM03 '37': NUM06 '38': NUM07 '39': NUM09 '40': NUM15 '41': P '42': POS '43': POS01 '44': POS02 '45': POS03 '46': POS04 '47': POS05 '48': POS06 '49': POS07 '50': POS08 '51': POS09 '52': POS10 '53': POS11 '54': POS14 '55': POS15 '56': PRES '57': PRON '58': PRON01 '59': PRON02 '60': PRON03 '61': PRON04 '62': PRON05 '63': PRON06 '64': PRON07 '65': PRON08 '66': PRON09 '67': PRON10 '68': PRON11 '69': PRON15 '70': PUNCT '71': QUANT01 '72': QUANT02 '73': QUANT03 '74': QUANT04 '75': QUANT05 '76': QUANT06 '77': QUANT07 '78': QUANT08 '79': QUANT09 '80': QUANT10 '81': QUANT11 '82': QUANT14 '83': QUANT15 '84': REL '85': REL01 '86': REL02 '87': REL03 '88': REL04 '89': REL05 '90': REL06 '91': REL07 '92': REL08 '93': REL09 '94': REL10 '95': REL11 '96': REL14 '97': REL15 '98': V - name: lang dtype: string splits: - name: train num_bytes: 740599 num_examples: 1701 - name: validation num_bytes: 89806 num_examples: 212 - name: test num_bytes: 86779 num_examples: 214 download_size: 319265 dataset_size: 917184 configs: - config_name: nbl data_files: - split: train path: nbl/train-* - split: validation path: nbl/validation-* - split: test path: nbl/test-* - config_name: nso data_files: - split: train path: nso/train-* - split: validation path: nso/validation-* - split: test path: nso/test-* - config_name: sot data_files: - split: train path: sot/train-* - split: validation path: sot/validation-* - split: test path: sot/test-* - config_name: ssw data_files: - split: train path: ssw/train-* - split: validation path: ssw/validation-* - split: test path: ssw/test-* - config_name: tsn data_files: - split: train path: tsn/train-* - split: validation path: tsn/validation-* - split: test path: tsn/test-* - config_name: tso data_files: - split: train path: tso/train-* - split: validation path: tso/validation-* - split: test path: tso/test-* - config_name: ven data_files: - split: train path: ven/train-* - split: validation path: ven/validation-* - split: test path: ven/test-* - config_name: xho data_files: - split: train path: xho/train-* - split: validation path: xho/validation-* - split: test path: xho/test-* - config_name: zul data_files: - split: train path: zul/train-* - split: validation path: zul/validation-* - split: test path: zul/test-* ---

This dataset contains text data in multiple African languages, designed for natural language processing tasks. Each language configuration includes character sequences (tokens) and corresponding tag sequences (tags), which cover various grammatical or semantic categories. The dataset is divided into training, validation, and test sets for model training and evaluation.
提供机构:
ndamulelonemakh
原始信息汇总

数据集概述

该数据集包含多个语言配置,每个配置都有特定的特征和分割。以下是各配置的详细信息:

配置:nbl

  • 特征
    • tokens:字符串序列
    • tags:类别标签序列,包含92个不同的标签,如A, ADJ00, ADJ01等
    • lang:字符串类型
  • 分割
    • train:1658个样本,678150字节
    • validation:207个样本,90955字节
    • test:208个样本,93000字节
  • 下载大小:304867字节
  • 数据集大小:862105字节

配置:nso

  • 特征
    • tokens:字符串序列
    • tags:类别标签序列,包含143个不同的标签,如ADJ01, ADJ02, ADJ03等
    • lang:字符串类型
  • 分割
    • train:1544个样本,917356字节
    • validation:193个样本,124533字节
    • test:194个样本,126562字节
  • 下载大小:275384字节
  • 数据集大小:1168451字节

配置:sot

  • 特征
    • tokens:字符串序列
    • tags:类别标签序列,包含172个不同的标签,如ADJ01, ADJ02, ADJ03等
    • lang:字符串类型
  • 分割
    • train:1533个样本,922621字节
    • validation:191个样本,125323字节
    • test:193个样本,113230字节
  • 下载大小:280169字节
  • 数据集大小:1161174字节

配置:ssw

  • 特征
    • tokens:字符串序列
    • tags:类别标签序列,包含92个不同的标签,如A, ADJ00, ADJ01等
    • lang:字符串类型
  • 分割
    • train:1619个样本,714304字节
    • validation:202个样本,88181字节
    • test:203个样本,79673字节
  • 下载大小:306483字节
  • 数据集大小:882158字节

配置:tsn

  • 特征
    • tokens:字符串序列
    • tags:类别标签序列,包含139个不同的标签,如ADJ01, ADJ02, ADJ03等
    • lang:字符串类型
  • 分割
    • train:1593个样本,922598字节
    • validation:199个样本,117207字节
    • test:200个样本,120921字节
  • 下载大小:284279字节
  • 数据集大小:1160726字节

配置:tso

  • 特征
    • tokens:字符串序列
    • tags:类别标签序列,包含143个不同的标签,如ADJ01, ADJ02, ADJ03等
    • lang:字符串类型
  • 分割
    • train:1561个样本,944279字节
    • validation:195个样本,109624字节
    • test:196个样本,125178字节
  • 下载大小:280301字节
  • 数据集大小:1179081字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作