ndamulelonemakh/za_chunk
收藏Hugging Face2024-06-04 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/ndamulelonemakh/za_chunk
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: nbl
features:
- name: tokens
sequence: string
- name: tags
sequence:
class_label:
names:
'0': A
'1': ADJ00
'2': ADJ01
'3': ADJ02
'4': ADJ03
'5': ADJ04
'6': ADJ05
'7': ADJ06
'8': ADJ07
'9': ADJ08
'10': ADJ09
'11': ADV
'12': CONJ
'13': COP
'14': DEM
'15': DEM01
'16': DEM02
'17': DEM03
'18': IDEOPH
'19': INT
'20': LOC
'21': M
'22': N
'23': N00
'24': N01
'25': N02
'26': N03
'27': N04
'28': N05
'29': N06
'30': N07
'31': N08
'32': N09
'33': N10
'34': N11
'35': N14
'36': NUM01
'37': NUM03
'38': NUM04
'39': NUM05
'40': NUM09
'41': NUM15
'42': P
'43': POS
'44': POS01
'45': POS02
'46': POS03
'47': POS04
'48': POS05
'49': POS06
'50': POS07
'51': POS08
'52': POS09
'53': POS10
'54': POS14
'55': POS15
'56': PRES
'57': PRN
'58': PRON01
'59': PRON02
'60': PRON03
'61': PRON04
'62': PRON05
'63': PRON07
'64': PRON09
'65': PRON10
'66': PRON15
'67': PUNCT
'68': QUANT01
'69': QUANT02
'70': QUANT03
'71': QUANT04
'72': QUANT05
'73': QUANT06
'74': QUANT07
'75': QUANT08
'76': QUANT09
'77': QUANT10
'78': QUANT11
'79': QUANT14
'80': QUANT15
'81': REL
'82': REL01
'83': REL02
'84': REL03
'85': REL04
'86': REL05
'87': REL06
'88': REL07
'89': REL08
'90': REL09
'91': REL10
'92': REL14
'93': REL15
'94': V
- name: lang
dtype: string
splits:
- name: train
num_bytes: 678150
num_examples: 1658
- name: validation
num_bytes: 90955
num_examples: 207
- name: test
num_bytes: 93000
num_examples: 208
download_size: 304867
dataset_size: 862105
- config_name: nso
features:
- name: tokens
sequence: string
- name: tags
sequence:
class_label:
names:
'0': ADJ01
'1': ADJ02
'2': ADJ03
'3': ADJ04
'4': ADJ05
'5': ADJ06
'6': ADJ07
'7': ADJ08
'8': ADJ09
'9': ADJ10
'10': ADJ14
'11': ADJLOC
'12': ADV
'13': CD01
'14': CD02
'15': CD03
'16': CD04
'17': CD05
'18': CD06
'19': CD07
'20': CD08
'21': CD09
'22': CD10
'23': CD14
'24': CD15
'25': CD16
'26': CD18
'27': CO01
'28': CO02
'29': CO03
'30': CO04
'31': CO05
'32': CO06
'33': CO07
'34': CO08
'35': CO09
'36': CO10
'37': CO14
'38': CONJ
'39': COPERS
'40': CPOSS01
'41': CPOSS02
'42': CPOSS03
'43': CPOSS04
'44': CPOSS05
'45': CPOSS06
'46': CPOSS07
'47': CPOSS08
'48': CPOSS09
'49': CPOSS10
'50': CPOSS14
'51': CPOSS15
'52': CPOSS17
'53': CPOSSPERS
'54': CS01
'55': CS02
'56': CS03
'57': CS04
'58': CS05
'59': CS06
'60': CS07
'61': CS08
'62': CS09
'63': CS10
'64': CS14
'65': CS15
'66': CSINDEF
'67': CSLOC
'68': CSNEUT
'69': CSPERS
'70': ENUM
'71': IDEO
'72': INF
'73': INT
'74': MORPHFUT
'75': MORPHNEG
'76': MORPHPER
'77': MORPHPOT
'78': MORPHPRES
'79': N01
'80': N01a
'81': N02
'82': N02b
'83': N03
'84': N04
'85': N05
'86': N06
'87': N07
'88': N08
'89': N09
'90': N10
'91': N14
'92': N16
'93': N17
'94': N18
'95': NLOC
'96': NOLABEL
'97': PART
'98': PARTQUE
'99': PROEMP01
'100': PROEMP02
'101': PROEMP03
'102': PROEMP04
'103': PROEMP05
'104': PROEMP06
'105': PROEMP07
'106': PROEMP08
'107': PROEMP09
'108': PROEMP10
'109': PROEMP14
'110': PROEMPLOC
'111': PROEMPPERS
'112': PROPOSS02
'113': PROPOSS03
'114': PROPOSS04
'115': PROPOSS05
'116': PROPOSS06
'117': PROPOSS07
'118': PROPOSS08
'119': PROPOSS09
'120': PROPOSS10
'121': PROPOSS14
'122': PROPOSSPERS
'123': PROQUANT04
'124': PROQUANT10
'125': PROQUANT14
'126': PROQUANTLOC
'127': QUE
'128': RO
'129': RS
'130': RV
'131': V
'132': VAUX
'133': VCOP
'134': VS
'135': ZE
'136': ZM
'137': ZPL
'138': ZPR
- name: lang
dtype: string
splits:
- name: train
num_bytes: 917356
num_examples: 1544
- name: validation
num_bytes: 124533
num_examples: 193
- name: test
num_bytes: 126562
num_examples: 194
download_size: 275384
dataset_size: 1168451
- config_name: sot
features:
- name: tokens
sequence: string
- name: tags
sequence:
class_label:
names:
'0': ADJ01
'1': ADJ02
'2': ADJ03
'3': ADJ04
'4': ADJ05
'5': ADJ06
'6': ADJ07
'7': ADJ08
'8': ADJ09
'9': ADJ10
'10': ADJ14
'11': ADJ15
'12': ADJC01
'13': ADJC02
'14': ADJC03
'15': ADJC04
'16': ADJC05
'17': ADJC06
'18': ADJC07
'19': ADJC08
'20': ADJC09
'21': ADJC10
'22': ADJC14
'23': ADJC15
'24': ADJLOC
'25': ADV
'26': CCOP07
'27': CCOP09
'28': CCOP10
'29': CCOPPERS
'30': CD01
'31': CD02
'32': CD03
'33': CD04
'34': CD05
'35': CD06
'36': CD07
'37': CD08
'38': CD09
'39': CD10
'40': CD14
'41': CD15
'42': CD17
'43': CD18
'44': CDLOC
'45': CN15
'46': CO01
'47': CO02
'48': CO03
'49': CO04
'50': CO05
'51': CO06
'52': CO07
'53': CO08
'54': CO09
'55': CO10
'56': CO14
'57': CO15
'58': COLOC
'59': CONJ
'60': COPERS
'61': CPOSS01
'62': CPOSS02
'63': CPOSS03
'64': CPOSS04
'65': CPOSS05
'66': CPOSS06
'67': CPOSS07
'68': CPOSS08
'69': CPOSS09
'70': CPOSS10
'71': CPOSS14
'72': CPOSS15
'73': CPOSSLOC
'74': CS01
'75': CS02
'76': CS03
'77': CS04
'78': CS05
'79': CS06
'80': CS07
'81': CS08
'82': CS09
'83': CS10
'84': CS14
'85': CS15
'86': CSINDEF
'87': CSLOC
'88': CSNEUT
'89': CSPERS
'90': ENUM
'91': IDEO
'92': INT
'93': MNEG
'94': MORPHFUT
'95': MORPHPOT
'96': MORPHPRES
'97': MORPHPROG
'98': N01
'99': N01a
'100': N02
'101': N02b
'102': N03
'103': N04
'104': N05
'105': N06
'106': N07
'107': N08
'108': N09
'109': N10
'110': N14
'111': N15
'112': N16
'113': N18
'114': NLOC
'115': NPP
'116': PART
'117': PROEMP01
'118': PROEMP02
'119': PROEMP03
'120': PROEMP04
'121': PROEMP05
'122': PROEMP06
'123': PROEMP07
'124': PROEMP08
'125': PROEMP09
'126': PROEMP10
'127': PROEMP14
'128': PROEMP15
'129': PROEMPLOC
'130': PROEMPPERS
'131': PROPOSS02
'132': PROPOSS03
'133': PROPOSS05
'134': PROPOSS06
'135': PROPOSS07
'136': PROPOSS08
'137': PROPOSS09
'138': PROPOSS10
'139': PROPOSS14
'140': PROPOSS15
'141': PROPOSSLOC
'142': PROPOSSPERS
'143': PROQUANT01
'144': PROQUANT02
'145': PROQUANT03
'146': PROQUANT04
'147': PROQUANT05
'148': PROQUANT06
'149': PROQUANT07
'150': PROQUANT08
'151': PROQUANT09
'152': PROQUANT10
'153': PROQUANT14
'154': PROQUANTLOC
'155': PRPOSSPERS
'156': QUE
'157': RA
'158': RF
'159': RK
'160': RL
'161': RO
'162': RS
'163': RSF
'164': RV
'165': V
'166': VAUX
'167': VCOP
'168': ZE
'169': ZM
'170': ZPL
'171': ZPR
- name: lang
dtype: string
splits:
- name: train
num_bytes: 922621
num_examples: 1533
- name: validation
num_bytes: 125323
num_examples: 191
- name: test
num_bytes: 113230
num_examples: 193
download_size: 280169
dataset_size: 1161174
- config_name: ssw
features:
- name: tokens
sequence: string
- name: tags
sequence:
class_label:
names:
'0': A
'1': ADJ00
'2': ADJ01
'3': ADJ02
'4': ADJ03
'5': ADJ04
'6': ADJ05
'7': ADJ06
'8': ADJ07
'9': ADJ08
'10': ADJ09
'11': ADV
'12': CONJ
'13': COP
'14': DEM
'15': DEM01
'16': DEM02
'17': DEM03
'18': IDEOPH
'19': INT
'20': LOC
'21': M
'22': N
'23': N00
'24': N01
'25': N02
'26': N03
'27': N04
'28': N05
'29': N06
'30': N07
'31': N08
'32': N09
'33': N11
'34': N14
'35': NUM05
'36': NUM09
'37': NUM15
'38': P
'39': POS
'40': POS01
'41': POS02
'42': POS03
'43': POS04
'44': POS05
'45': POS06
'46': POS07
'47': POS09
'48': POS11
'49': POS14
'50': POS15
'51': PRES
'52': PRN
'53': PRON01
'54': PRON02
'55': PRON03
'56': PRON04
'57': PRON05
'58': PRON06
'59': PRON07
'60': PRON09
'61': PRON11
'62': PRON14
'63': PRON15
'64': PUNCT
'65': QUANT01
'66': QUANT02
'67': QUANT03
'68': QUANT04
'69': QUANT05
'70': QUANT06
'71': QUANT07
'72': QUANT08
'73': QUANT09
'74': QUANT10
'75': QUANT11
'76': QUANT15
'77': REL
'78': REL01
'79': REL02
'80': REL03
'81': REL04
'82': REL05
'83': REL06
'84': REL07
'85': REL08
'86': REL09
'87': REL10
'88': REL11
'89': REL14
'90': REL15
'91': V
- name: lang
dtype: string
splits:
- name: train
num_bytes: 714304
num_examples: 1619
- name: validation
num_bytes: 88181
num_examples: 202
- name: test
num_bytes: 79673
num_examples: 203
download_size: 306483
dataset_size: 882158
- config_name: tsn
features:
- name: tokens
sequence: string
- name: tags
sequence:
class_label:
names:
'0': ADJ01
'1': ADJ02
'2': ADJ03
'3': ADJ04
'4': ADJ05
'5': ADJ06
'6': ADJ07
'7': ADJ08
'8': ADJ09
'9': ADJ10
'10': ADJ14
'11': ADJLOC
'12': ADV
'13': CD01
'14': CD02
'15': CD03
'16': CD04
'17': CD05
'18': CD06
'19': CD07
'20': CD08
'21': CD09
'22': CD10
'23': CD11
'24': CD14
'25': CD15
'26': CD16
'27': CDLOC
'28': CN15
'29': CO01
'30': CO02
'31': CO04
'32': CO05
'33': CO06
'34': CO07
'35': CO08
'36': CO09
'37': CO10
'38': CO14
'39': CO17
'40': CONJ
'41': COPERS
'42': CPOSS01
'43': CPOSS02
'44': CPOSS03
'45': CPOSS04
'46': CPOSS05
'47': CPOSS06
'48': CPOSS07
'49': CPOSS08
'50': CPOSS09
'51': CPOSS10
'52': CPOSS11
'53': CPOSS14
'54': CPOSS15
'55': CPOSS17
'56': CPOSS18
'57': CPOSSLOC
'58': CS01
'59': CS02
'60': CS03
'61': CS04
'62': CS05
'63': CS06
'64': CS07
'65': CS08
'66': CS09
'67': CS10
'68': CS11
'69': CS14
'70': CS15
'71': CSINDEF
'72': CSLOC
'73': CSNEUT
'74': CSPERS
'75': ENUM
'76': INT
'77': MNEG
'78': N01
'79': N01a
'80': N02
'81': N02b
'82': N03
'83': N04
'84': N05
'85': N06
'86': N07
'87': N08
'88': N09
'89': N10
'90': N11
'91': N14
'92': N15
'93': N17
'94': N18
'95': NLOC
'96': NPP
'97': PART
'98': PROEMP01
'99': PROEMP02
'100': PROEMP03
'101': PROEMP04
'102': PROEMP05
'103': PROEMP06
'104': PROEMP07
'105': PROEMP09
'106': PROEMP10
'107': PROEMP14
'108': PROEMPLOC
'109': PROEMPPERS
'110': PROPOSS02
'111': PROPOSSPERS
'112': PROQUANT01
'113': PROQUANT02
'114': PROQUANT04
'115': PROQUANT06
'116': PROQUANT07
'117': PROQUANT09
'118': PROQUANT10
'119': PROQUANT11
'120': PROQUANT14
'121': PROQUANT15
'122': PROQUANTLOC
'123': PROQUANTPERS
'124': QUE
'125': RA
'126': RF
'127': RK
'128': RL
'129': RS
'130': RV
'131': TENSE
'132': V
'133': VAUX
'134': VCOP
'135': ZE
'136': ZM
'137': ZPL
'138': ZPR
- name: lang
dtype: string
splits:
- name: train
num_bytes: 922598
num_examples: 1593
- name: validation
num_bytes: 117207
num_examples: 199
- name: test
num_bytes: 120921
num_examples: 200
download_size: 284279
dataset_size: 1160726
- config_name: tso
features:
- name: tokens
sequence: string
- name: tags
sequence:
class_label:
names:
'0': ADJ01
'1': ADJ02
'2': ADJ03
'3': ADJ04
'4': ADJ05
'5': ADJ06
'6': ADJ07
'7': ADJ08
'8': ADJ09
'9': ADJ10
'10': ADJ11
'11': ADJ14
'12': ADV
'13': ASP
'14': CCOP05
'15': CCOP06
'16': CCOP14
'17': CD01
'18': CD02
'19': CD03
'20': CD04
'21': CD05
'22': CD06
'23': CD07
'24': CD08
'25': CD09
'26': CD10
'27': CD14
'28': CD15
'29': CDLOC
'30': CN15
'31': CO01
'32': CO02
'33': CO03
'34': CO04
'35': CO05
'36': CO06
'37': CO07
'38': CO08
'39': CO09
'40': CO10
'41': CO14
'42': CO15
'43': CONJ
'44': COPERS
'45': CPOSS01
'46': CPOSS02
'47': CPOSS03
'48': CPOSS04
'49': CPOSS05
'50': CPOSS06
'51': CPOSS07
'52': CPOSS08
'53': CPOSS09
'54': CPOSS10
'55': CPOSS14
'56': CPOSS15
'57': CPOSS17
'58': CPOSSLOC
'59': CS01
'60': CS02
'61': CS03
'62': CS04
'63': CS05
'64': CS06
'65': CS07
'66': CS08
'67': CS09
'68': CS10
'69': CS14
'70': CS15
'71': CSLOC
'72': CSNEUT
'73': CSPERS
'74': ENUM
'75': IDEO
'76': INT
'77': MNEG
'78': N01
'79': N01a
'80': N02
'81': N03
'82': N04
'83': N05
'84': N06
'85': N07
'86': N08
'87': N09
'88': N10
'89': N11
'90': N14
'91': N16
'92': N18
'93': NLOC
'94': PART
'95': PROEMP01
'96': PROEMP02
'97': PROEMP03
'98': PROEMP04
'99': PROEMP05
'100': PROEMP06
'101': PROEMP07
'102': PROEMP08
'103': PROEMP09
'104': PROEMP10
'105': PROEMP14
'106': PROEMP15
'107': PROEMPLOC
'108': PROEMPPERS
'109': PROPOSS02
'110': PROPOSS08
'111': PROPOSSPERS
'112': PROQUANT01
'113': PROQUANT02
'114': PROQUANT03
'115': PROQUANT04
'116': PROQUANT05
'117': PROQUANT06
'118': PROQUANT07
'119': PROQUANT08
'120': PROQUANT09
'121': PROQUANT10
'122': PROQUANT11
'123': PROQUANT14
'124': PROQUANT15
'125': PROQUANTLOC
'126': QUE
'127': RA
'128': RK
'129': RL
'130': RO
'131': RPF
'132': RS
'133': RV
'134': RWD
'135': TENSE
'136': V
'137': VAUX
'138': VCOP
'139': ZE
'140': ZM
'141': ZPL
'142': ZPR
- name: lang
dtype: string
splits:
- name: train
num_bytes: 944279
num_examples: 1561
- name: validation
num_bytes: 109624
num_examples: 195
- name: test
num_bytes: 125178
num_examples: 196
download_size: 280301
dataset_size: 1179081
- config_name: ven
features:
- name: tokens
sequence: string
- name: tags
sequence:
class_label:
names:
'0': ADJ01
'1': ADJ02
'2': ADJ03
'3': ADJ04
'4': ADJ05
'5': ADJ06
'6': ADJ07
'7': ADJ08
'8': ADJ09
'9': ADJ10
'10': ADJ11
'11': ADJ14
'12': ADJ15
'13': ADJ16
'14': ADJ17
'15': ADJLOC
'16': ADV
'17': ASP
'18': CCOP02
'19': CCOP05
'20': CCOP06
'21': CCOP07
'22': CCOP08
'23': CCOP09
'24': CCOP10
'25': CD01
'26': CD02
'27': CD03
'28': CD04
'29': CD05
'30': CD06
'31': CD07
'32': CD08
'33': CD09
'34': CD10
'35': CD14
'36': CD15
'37': CD16
'38': CD17
'39': CD20
'40': CDLOC
'41': CN15
'42': CO01
'43': CO02
'44': CO04
'45': CO05
'46': CO06
'47': CO07
'48': CO08
'49': CO09
'50': CO10
'51': CONJ
'52': CP15
'53': CPOSS01
'54': CPOSS02
'55': CPOSS03
'56': CPOSS04
'57': CPOSS05
'58': CPOSS06
'59': CPOSS07
'60': CPOSS08
'61': CPOSS09
'62': CPOSS10
'63': CPOSS11
'64': CPOSS17
'65': CPOSS20
'66': CPOSSLOC
'67': CREL01
'68': CREL02
'69': CREL03
'70': CREL04
'71': CREL05
'72': CREL06
'73': CREL07
'74': CREL08
'75': CREL09
'76': CREL10
'77': CREL11
'78': CREL14
'79': CREL15
'80': CREL16
'81': CREL17
'82': CREL20
'83': CS01
'84': CS02
'85': CS03
'86': CS04
'87': CS05
'88': CS06
'89': CS07
'90': CS08
'91': CS09
'92': CS10
'93': CS11
'94': CS14
'95': CS15
'96': CS16
'97': CS17
'98': CS18
'99': CS20
'100': CSPERS
'101': ENUM
'102': IDEO
'103': INTER
'104': MNEG
'105': MORPHFUT
'106': MORPHNEG
'107': N01
'108': N01a
'109': N02
'110': N02b
'111': N03
'112': N04
'113': N05
'114': N06
'115': N07
'116': N08
'117': N09
'118': N10
'119': N11
'120': N14
'121': N16
'122': N18
'123': N20
'124': NLOC
'125': NPP
'126': PART
'127': PARTCOPNEG
'128': PARTQUE
'129': PREPOS
'130': PROABS01
'131': PROABS02
'132': PROABS03
'133': PROABS04
'134': PROABS06
'135': PROABS07
'136': PROABS08
'137': PROABS09
'138': PROABS10
'139': PROABS14
'140': PROPOSS01
'141': PROPOSS02
'142': PROPOSS03
'143': PROPOSS04
'144': PROPOSS05
'145': PROPOSS06
'146': PROPOSS07
'147': PROPOSS08
'148': PROPOSS09
'149': PROPOSS10
'150': PROPOSS11
'151': PROPOSS14
'152': PROPOSS15
'153': PROPOSS16
'154': PROPOSS17
'155': PROPOSS20
'156': PROPOSSLOC
'157': PROPOSSPERS
'158': PROPRESREL01
'159': PROPRESREL03
'160': PROPRESREL06
'161': PROPRESREL09
'162': PROPRESREL14
'163': PROPRESREL15
'164': PROPRESREL16
'165': PROPRESREL17
'166': PROPSTREL01
'167': PROPSTREL02
'168': PROPSTREL03
'169': PROPSTREL05
'170': PROPSTREL06
'171': PROPSTREL07
'172': PROPSTREL08
'173': PROPSTREL09
'174': PROPSTREL10
'175': PROPSTREL11
'176': PROPSTREL16
'177': PROPSTREL17
'178': PROQUANT01
'179': PROQUANT02
'180': PROQUANT03
'181': PROQUANT05
'182': PROQUANT06
'183': PROQUANT07
'184': PROQUANT08
'185': PROQUANT09
'186': PROQUANT10
'187': PROQUANT11
'188': PROQUANT14
'189': PROQUANT16
'190': PROQUANTLOC
'191': PROQUANTPERS
'192': PROREFL01
'193': PROREFL02
'194': PROREFLPERS
'195': PROREL01
'196': PROREL02
'197': PROREL03
'198': PROREL04
'199': PROREL05
'200': PROREL06
'201': PROREL07
'202': PROREL08
'203': PROREL09
'204': PROREL10
'205': PROREL11
'206': PROREL14
'207': PROREL16
'208': PROREL17
'209': PROREL20
'210': QUE
'211': RK
'212': RS
'213': RV
'214': TENSE
'215': V
'216': VAUX
'217': VBPREF
'218': VCOP
'219': ZE
'220': ZM
'221': ZPL
'222': ZPR
- name: lang
dtype: string
splits:
- name: train
num_bytes: 908224
num_examples: 1680
- name: validation
num_bytes: 112219
num_examples: 210
- name: test
num_bytes: 119436
num_examples: 210
download_size: 275219
dataset_size: 1139879
- config_name: xho
features:
- name: tokens
sequence: string
- name: tags
sequence:
class_label:
names:
'0': A
'1': ADJ00
'2': ADJ01
'3': ADJ02
'4': ADJ04
'5': ADJ05
'6': ADJ06
'7': ADJ07
'8': ADJ08
'9': ADJ09
'10': ADV
'11': CONJ
'12': COP
'13': DEM
'14': DEM01
'15': DEM02
'16': DEM03
'17': IDEOPH
'18': INT
'19': LOC
'20': M
'21': N
'22': N00
'23': N01
'24': N02
'25': N03
'26': N04
'27': N05
'28': N06
'29': N07
'30': N08
'31': N09
'32': N10
'33': N11
'34': N14
'35': P
'36': POS
'37': POS01
'38': POS04
'39': POS05
'40': POS06
'41': POS07
'42': POS10
'43': POS11
'44': POS14
'45': POS15
'46': PRES
'47': PRN
'48': PRON01
'49': PRON02
'50': PRON03
'51': PRON04
'52': PRON07
'53': PRON10
'54': PRON14
'55': PUNCT
'56': QUANT01
'57': QUANT04
'58': QUANT05
'59': QUANT06
'60': QUANT10
'61': QUANT11
'62': QUANT14
'63': QUANT15
'64': REL
'65': REL01
'66': REL02
'67': REL04
'68': REL05
'69': REL06
'70': REL10
'71': REL11
'72': REL14
'73': REL15
'74': V
- name: lang
dtype: string
splits:
- name: train
num_bytes: 736627
num_examples: 1610
- name: validation
num_bytes: 97129
num_examples: 201
- name: test
num_bytes: 92764
num_examples: 202
download_size: 327155
dataset_size: 926520
- config_name: zul
features:
- name: tokens
sequence: string
- name: tags
sequence:
class_label:
names:
'0': ADJ
'1': ADJ00
'2': ADJ01
'3': ADJ02
'4': ADJ03
'5': ADJ04
'6': ADJ05
'7': ADJ06
'8': ADJ07
'9': ADJ08
'10': ADJ09
'11': ADV
'12': CONJ
'13': COP
'14': DEM
'15': DEM01
'16': DEM02
'17': DEM03
'18': IDEOPH
'19': INT
'20': LOC
'21': M
'22': N
'23': N00
'24': N01
'25': N02
'26': N03
'27': N04
'28': N05
'29': N06
'30': N07
'31': N08
'32': N09
'33': N10
'34': N11
'35': N14
'36': NUM03
'37': NUM06
'38': NUM07
'39': NUM09
'40': NUM15
'41': P
'42': POS
'43': POS01
'44': POS02
'45': POS03
'46': POS04
'47': POS05
'48': POS06
'49': POS07
'50': POS08
'51': POS09
'52': POS10
'53': POS11
'54': POS14
'55': POS15
'56': PRES
'57': PRON
'58': PRON01
'59': PRON02
'60': PRON03
'61': PRON04
'62': PRON05
'63': PRON06
'64': PRON07
'65': PRON08
'66': PRON09
'67': PRON10
'68': PRON11
'69': PRON15
'70': PUNCT
'71': QUANT01
'72': QUANT02
'73': QUANT03
'74': QUANT04
'75': QUANT05
'76': QUANT06
'77': QUANT07
'78': QUANT08
'79': QUANT09
'80': QUANT10
'81': QUANT11
'82': QUANT14
'83': QUANT15
'84': REL
'85': REL01
'86': REL02
'87': REL03
'88': REL04
'89': REL05
'90': REL06
'91': REL07
'92': REL08
'93': REL09
'94': REL10
'95': REL11
'96': REL14
'97': REL15
'98': V
- name: lang
dtype: string
splits:
- name: train
num_bytes: 740599
num_examples: 1701
- name: validation
num_bytes: 89806
num_examples: 212
- name: test
num_bytes: 86779
num_examples: 214
download_size: 319265
dataset_size: 917184
configs:
- config_name: nbl
data_files:
- split: train
path: nbl/train-*
- split: validation
path: nbl/validation-*
- split: test
path: nbl/test-*
- config_name: nso
data_files:
- split: train
path: nso/train-*
- split: validation
path: nso/validation-*
- split: test
path: nso/test-*
- config_name: sot
data_files:
- split: train
path: sot/train-*
- split: validation
path: sot/validation-*
- split: test
path: sot/test-*
- config_name: ssw
data_files:
- split: train
path: ssw/train-*
- split: validation
path: ssw/validation-*
- split: test
path: ssw/test-*
- config_name: tsn
data_files:
- split: train
path: tsn/train-*
- split: validation
path: tsn/validation-*
- split: test
path: tsn/test-*
- config_name: tso
data_files:
- split: train
path: tso/train-*
- split: validation
path: tso/validation-*
- split: test
path: tso/test-*
- config_name: ven
data_files:
- split: train
path: ven/train-*
- split: validation
path: ven/validation-*
- split: test
path: ven/test-*
- config_name: xho
data_files:
- split: train
path: xho/train-*
- split: validation
path: xho/validation-*
- split: test
path: xho/test-*
- config_name: zul
data_files:
- split: train
path: zul/train-*
- split: validation
path: zul/validation-*
- split: test
path: zul/test-*
---
This dataset contains text data in multiple African languages, designed for natural language processing tasks. Each language configuration includes character sequences (tokens) and corresponding tag sequences (tags), which cover various grammatical or semantic categories. The dataset is divided into training, validation, and test sets for model training and evaluation.
提供机构:
ndamulelonemakh
原始信息汇总
数据集概述
该数据集包含多个语言配置,每个配置都有特定的特征和分割。以下是各配置的详细信息:
配置:nbl
- 特征:
tokens:字符串序列tags:类别标签序列,包含92个不同的标签,如A, ADJ00, ADJ01等lang:字符串类型
- 分割:
train:1658个样本,678150字节validation:207个样本,90955字节test:208个样本,93000字节
- 下载大小:304867字节
- 数据集大小:862105字节
配置:nso
- 特征:
tokens:字符串序列tags:类别标签序列,包含143个不同的标签,如ADJ01, ADJ02, ADJ03等lang:字符串类型
- 分割:
train:1544个样本,917356字节validation:193个样本,124533字节test:194个样本,126562字节
- 下载大小:275384字节
- 数据集大小:1168451字节
配置:sot
- 特征:
tokens:字符串序列tags:类别标签序列,包含172个不同的标签,如ADJ01, ADJ02, ADJ03等lang:字符串类型
- 分割:
train:1533个样本,922621字节validation:191个样本,125323字节test:193个样本,113230字节
- 下载大小:280169字节
- 数据集大小:1161174字节
配置:ssw
- 特征:
tokens:字符串序列tags:类别标签序列,包含92个不同的标签,如A, ADJ00, ADJ01等lang:字符串类型
- 分割:
train:1619个样本,714304字节validation:202个样本,88181字节test:203个样本,79673字节
- 下载大小:306483字节
- 数据集大小:882158字节
配置:tsn
- 特征:
tokens:字符串序列tags:类别标签序列,包含139个不同的标签,如ADJ01, ADJ02, ADJ03等lang:字符串类型
- 分割:
train:1593个样本,922598字节validation:199个样本,117207字节test:200个样本,120921字节
- 下载大小:284279字节
- 数据集大小:1160726字节
配置:tso
- 特征:
tokens:字符串序列tags:类别标签序列,包含143个不同的标签,如ADJ01, ADJ02, ADJ03等lang:字符串类型
- 分割:
train:1561个样本,944279字节validation:195个样本,109624字节test:196个样本,125178字节
- 下载大小:280301字节
- 数据集大小:1179081字节



