etornam/fleurs_all
收藏Hugging Face2025-12-09 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/etornam/fleurs_all
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: int32
- name: num_samples
dtype: int32
- name: path
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: transcription
dtype: string
- name: raw_transcription
dtype: string
- name: gender
dtype:
class_label:
names:
'0': male
'1': female
'2': other
- name: lang_id
dtype:
class_label:
names:
'0': af_za
'1': am_et
'2': ar_eg
'3': as_in
'4': ast_es
'5': az_az
'6': be_by
'7': bg_bg
'8': bn_in
'9': bs_ba
'10': ca_es
'11': ceb_ph
'12': ckb_iq
'13': cmn_hans_cn
'14': cs_cz
'15': cy_gb
'16': da_dk
'17': de_de
'18': el_gr
'19': en_us
'20': es_419
'21': et_ee
'22': fa_ir
'23': ff_sn
'24': fi_fi
'25': fil_ph
'26': fr_fr
'27': ga_ie
'28': gl_es
'29': gu_in
'30': ha_ng
'31': he_il
'32': hi_in
'33': hr_hr
'34': hu_hu
'35': hy_am
'36': id_id
'37': ig_ng
'38': is_is
'39': it_it
'40': ja_jp
'41': jv_id
'42': ka_ge
'43': kam_ke
'44': kea_cv
'45': kk_kz
'46': km_kh
'47': kn_in
'48': ko_kr
'49': ky_kg
'50': lb_lu
'51': lg_ug
'52': ln_cd
'53': lo_la
'54': lt_lt
'55': luo_ke
'56': lv_lv
'57': mi_nz
'58': mk_mk
'59': ml_in
'60': mn_mn
'61': mr_in
'62': ms_my
'63': mt_mt
'64': my_mm
'65': nb_no
'66': ne_np
'67': nl_nl
'68': nso_za
'69': ny_mw
'70': oc_fr
'71': om_et
'72': or_in
'73': pa_in
'74': pl_pl
'75': ps_af
'76': pt_br
'77': ro_ro
'78': ru_ru
'79': sd_in
'80': sk_sk
'81': sl_si
'82': sn_zw
'83': so_so
'84': sr_rs
'85': sv_se
'86': sw_ke
'87': ta_in
'88': te_in
'89': tg_tj
'90': th_th
'91': tr_tr
'92': uk_ua
'93': umb_ao
'94': ur_pk
'95': uz_uz
'96': vi_vn
'97': wo_sn
'98': xh_za
'99': yo_ng
'100': yue_hant_hk
'101': zu_za
'102': all
- name: language
dtype: string
- name: lang_group_id
dtype:
class_label:
names:
'0': western_european_we
'1': eastern_european_ee
'2': central_asia_middle_north_african_cmn
'3': sub_saharan_african_ssa
'4': south_asian_sa
'5': south_east_asian_sea
'6': chinese_japanase_korean_cjk
splits:
- name: train
num_bytes: 227088077024.558
num_examples: 271798
- name: validation
num_bytes: 27456597032.204
num_examples: 34452
- name: test
num_bytes: 65211223126.9
num_examples: 77810
download_size: 315287189080
dataset_size: 319755897183.66205
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
license: cc
---
Expanded version of [google/fleurs](https://huggingface.co/datasets/google/fleurs).
Refer the original dataset for ([google/fleurs](https://huggingface.co/datasets/google/fleurs)) for citation and original license information.
数据集信息:
特征列表:
- 名称:id,数据类型:int32(32位整数)
- 名称:num_samples,数据类型:int32(32位整数)
- 名称:path,数据类型:string(字符串)
- 名称:audio,数据类型:音频(audio),参数为采样率(sampling_rate):16000
- 名称:transcription,数据类型:string(字符串)
- 名称:raw_transcription,数据类型:string(字符串)
- 名称:gender,数据类型:类别标签(class_label),类别映射为:0: male(男性),1: female(女性),2: other(其他)
- 名称:lang_id,数据类型:类别标签(class_label),类别名称如下:
0: af_za(南非荷兰语),1: am_et(阿姆哈拉语),2: ar_eg(埃及阿拉伯语),3: as_in(阿萨姆语),4: ast_es(阿斯图里亚斯语),5: az_az(阿塞拜疆语),6: be_by(白俄罗斯语),7: bg_bg(保加利亚语),8: bn_in(孟加拉语),9: bs_ba(波斯尼亚语),10: ca_es(加泰罗尼亚语),11: ceb_ph(宿务语),12: ckb_iq(伊拉克库尔德语),13: cmn_hans_cn(中国大陆简体中文),14: cs_cz(捷克语),15: cy_gb(威尔士语),16: da_dk(丹麦语),17: de_de(德语),18: el_gr(希腊语),19: en_us(美式英语),20: es_419(拉丁美洲西班牙语),21: et_ee(爱沙尼亚语),22: fa_ir(波斯语),23: ff_sn(塞内加尔富拉尼语),24: fi_fi(芬兰语),25: fil_ph(菲律宾他加禄语),26: fr_fr(法语),27: ga_ie(爱尔兰语),28: gl_es(加利西亚语),29: gu_in(古吉拉特语),30: ha_ng(豪萨语),31: he_il(希伯来语),32: hi_in(印地语),33: hr_hr(克罗地亚语),34: hu_hu(匈牙利语),35: hy_am(亚美尼亚语),36: id_id(印度尼西亚语),37: ig_ng(伊博语),38: is_is(冰岛语),39: it_it(意大利语),40: ja_jp(日语),41: jv_id(爪哇语),42: ka_ge(格鲁吉亚语),43: kam_ke(卡姆巴语),44: kea_cv(卡布韦尔多语),45: kk_kz(哈萨克语),46: km_kh(高棉语),47: kn_in(卡纳达语),48: ko_kr(韩语),49: ky_kg(吉尔吉斯语),50: lb_lu(卢森堡语),51: lg_ug(卢干达语),52: ln_cd(林加拉语),53: lo_la(老挝语),54: lt_lt(立陶宛语),55: luo_ke(卢奥语),56: lv_lv(拉脱维亚语),57: mi_nz(毛利语),58: mk_mk(马其顿语),59: ml_in(马拉雅拉姆语),60: mn_mn(蒙古语),61: mr_in(马拉地语),62: ms_my(马来语),63: mt_mt(马耳他语),64: my_mm(缅甸语),65: nb_no(挪威博克马尔语),66: ne_np(尼泊尔语),67: nl_nl(荷兰语),68: nso_za(北索托语),69: ny_mw(齐切瓦语),70: oc_fr(奥克西坦语),71: om_et(奥罗莫语),72: or_in(奥里亚语),73: pa_in(旁遮普语),74: pl_pl(波兰语),75: ps_af(普什图语),76: pt_br(巴西葡萄牙语),77: ro_ro(罗马尼亚语),78: ru_ru(俄语),79: sd_in(信德语),80: sk_sk(斯洛伐克语),81: sl_si(斯洛文尼亚语),82: sn_zw(绍纳语),83: so_so(索马里语),84: sr_rs(塞尔维亚语),85: sv_se(瑞典语),86: sw_ke(斯瓦希里语),87: ta_in(泰米尔语),88: te_in(泰卢固语),89: tg_tj(塔吉克语),90: th_th(泰语),91: tr_tr(土耳其语),92: uk_ua(乌克兰语),93: umb_ao(姆本杜语),94: ur_pk(乌尔都语),95: uz_uz(乌兹别克语),96: vi_vn(越南语),97: wo_sn(沃洛夫语),98: xh_za(科萨语),99: yo_ng(约鲁巴语),100: yue_hant_hk(香港繁体粤语),101: zu_za(祖鲁语),102: all(全部)
- 名称:language,数据类型:string(字符串)
- 名称:lang_group_id,数据类型:类别标签(class_label),类别名称如下:
0: western_european_we(西欧语系),1: eastern_european_ee(东欧语系),2: central_asia_middle_north_african_cmn(中亚、中东及北非与汉语语系),3: sub_saharan_african_ssa(撒哈拉以南非洲语系),4: south_asian_sa(南亚语系),5: south_east_asian_sea(东南亚语系),6: chinese_japanese_korean_cjk(中、日、韩语系)
划分集:
- 名称:train(训练集),数据字节数:227088077024.558,样本数量:271798
- 名称:validation(验证集),数据字节数:27456597032.204,样本数量:34452
- 名称:test(测试集),数据字节数:65211223126.9,样本数量:77810
下载总大小:315287189080
数据集总占用空间:319755897183.66205
配置项:
- 配置名称:default(默认配置),数据文件路径:
- 训练集:data/train-*
- 验证集:data/validation-*
- 测试集:data/test-*
许可证:cc
本数据集为[google/fleurs](https://huggingface.co/datasets/google/fleurs)的扩展版本。请参考原始数据集[google/fleurs](https://huggingface.co/datasets/google/fleurs)获取引用方式及原始许可证相关信息。
提供机构:
etornam



