amuseix/fleurs_full_text
收藏Hugging Face2024-05-14 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/amuseix/fleurs_full_text
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含两个配置:bg_bg和edited。每个配置都包含音频、转录、性别、语言ID、语言和语言组ID等特征。数据集被分为训练集、验证集和测试集,并提供了每个分割的字节数和示例数。
该数据集包含两个配置:bg_bg和edited。每个配置都包含音频、转录、性别、语言ID、语言和语言组ID等特征。数据集被分为训练集、验证集和测试集,并提供了每个分割的字节数和示例数。
提供机构:
amuseix
原始信息汇总
数据集概述
配置名称:bg_bg
特征信息:
- id: int32
- num_samples: int32
- path: string
- audio:
- sampling_rate: 16000
- transcription: string
- raw_transcription: string
- gender:
- 0: male
- 1: female
- 2: other
- lang_id:
- 0: af_za
- 1: am_et
- 2: ar_eg
- 3: as_in
- 4: ast_es
- 5: az_az
- 6: be_by
- 7: bg_bg
- 8: bn_in
- 9: bs_ba
- 10: ca_es
- 11: ceb_ph
- 12: ckb_iq
- 13: cmn_hans_cn
- 14: cs_cz
- 15: cy_gb
- 16: da_dk
- 17: de_de
- 18: el_gr
- 19: en_us
- 20: es_419
- 21: et_ee
- 22: fa_ir
- 23: ff_sn
- 24: fi_fi
- 25: fil_ph
- 26: fr_fr
- 27: ga_ie
- 28: gl_es
- 29: gu_in
- 30: ha_ng
- 31: he_il
- 32: hi_in
- 33: hr_hr
- 34: hu_hu
- 35: hy_am
- 36: id_id
- 37: ig_ng
- 38: is_is
- 39: it_it
- 40: ja_jp
- 41: jv_id
- 42: ka_ge
- 43: kam_ke
- 44: kea_cv
- 45: kk_kz
- 46: km_kh
- 47: kn_in
- 48: ko_kr
- 49: ky_kg
- 50: lb_lu
- 51: lg_ug
- 52: ln_cd
- 53: lo_la
- 54: lt_lt
- 55: luo_ke
- 56: lv_lv
- 57: mi_nz
- 58: mk_mk
- 59: ml_in
- 60: mn_mn
- 61: mr_in
- 62: ms_my
- 63: mt_mt
- 64: my_mm
- 65: nb_no
- 66: ne_np
- 67: nl_nl
- 68: nso_za
- 69: ny_mw
- 70: oc_fr
- 71: om_et
- 72: or_in
- 73: pa_in
- 74: pl_pl
- 75: ps_af
- 76: pt_br
- 77: ro_ro
- 78: ru_ru
- 79: sd_in
- 80: sk_sk
- 81: sl_si
- 82: sn_zw
- 83: so_so
- 84: sr_rs
- 85: sv_se
- 86: sw_ke
- 87: ta_in
- 88: te_in
- 89: tg_tj
- 90: th_th
- 91: tr_tr
- 92: uk_ua
- 93: umb_ao
- 94: ur_pk
- 95: uz_uz
- 96: vi_vn
- 97: wo_sn
- 98: xh_za
- 99: yo_ng
- 100: yue_hant_hk
- 101: zu_za
- 102: all
- language: string
- lang_group_id:
- 0: western_european_we
- 1: eastern_european_ee
- 2: central_asia_middle_north_african_cmn
- 3: sub_saharan_african_ssa
- 4: south_asian_sa
- 5: south_east_asian_sea
- 6: chinese_japanase_korean_cjk
数据分割信息:
- train:
- num_bytes: 2191261618.977
- num_examples: 2973
- validation:
- num_bytes: 243647311.0
- num_examples: 395
- test:
- num_bytes: 428466577.0
- num_examples: 658
数据集大小:
- download_size: 2827772466
- dataset_size: 2863375506.977
配置名称:edited
特征信息:
- id: int32
- num_samples: int32
- path: string
- audio:
- sampling_rate: 16000
- transcription: string
- raw_transcription: string
- gender:
- 0: male
- 1: female
- 2: other
- lang_id:
- 0: af_za
- 1: am_et
- 2: ar_eg
- 3: as_in
- 4: ast_es
- 5: az_az
- 6: be_by
- 7: bg_bg
- 8: bn_in
- 9: bs_ba
- 10: ca_es
- 11: ceb_ph
- 12: ckb_iq
- 13: cmn_hans_cn
- 14: cs_cz
- 15: cy_gb
- 16: da_dk
- 17: de_de
- 18: el_gr
- 19: en_us
- 20: es_419
- 21: et_ee
- 22: fa_ir
- 23: ff_sn
- 24: fi_fi
- 25: fil_ph
- 26: fr_fr
- 27: ga_ie
- 28: gl_es
- 29: gu_in
- 30: ha_ng
- 31: he_il
- 32: hi_in
- 33: hr_hr
- 34: hu_hu
- 35: hy_am
- 36: id_id
- 37: ig_ng
- 38: is_is
- 39: it_it
- 40: ja_jp
- 41: jv_id
- 42: ka_ge
- 43: kam_ke
- 44: kea_cv
- 45: kk_kz
- 46: km_kh
- 47: kn_in
- 48: ko_kr
- 49: ky_kg
- 50: lb_lu
- 51: lg_ug
- 52: ln_cd
- 53: lo_la
- 54: lt_lt
- 55: luo_ke
- 56: lv_lv
- 57: mi_nz
- 58: mk_mk
- 59: ml_in
- 60: mn_mn
- 61: mr_in
- 62: ms_my
- 63: mt_mt
- 64: my_mm
- 65: nb_no
- 66: ne_np
- 67: nl_nl
- 68: nso_za
- 69: ny_mw
- 70: oc_fr
- 71: om_et
- 72: or_in
- 73: pa_in
- 74: pl_pl
- 75: ps_af
- 76: pt_br
- 77: ro_ro
- 78: ru_ru
- 79: sd_in
- 80: sk_sk
- 81: sl_si
- 82: sn_zw
- 83: so_so
- 84: sr_rs
- 85: sv_se
- 86: sw_ke
- 87: ta_in
- 88: te_in
- 89: tg_tj
- 90: th_th
- 91: tr_tr
- 92: uk_ua
- 93: umb_ao
- 94: ur_pk
- 95: uz_uz
- 96: vi_vn
- 97: wo_sn
- 98: xh_za
- 99: yo_ng
- 100: yue_hant_hk
- 101: zu_za
- 102: all
- language: string
- lang_group_id:
- 0: western_european_we
- 1: eastern_european_ee
- 2: central_asia_middle_north_african_cmn
- 3: sub_saharan_african_ssa
- 4: south_asian_sa
- 5: south_east_asian_sea
- 6: chinese_japanase_korean_cjk
数据分割信息:
- train:
- num_bytes: 2167335360.444
- num_examples: 2973
- validation:
- num_bytes: 243647311.0
- num_examples: 395
- test:
- num_bytes: 428466577.0
- num_examples: 658
数据集大小:
- download_size: 2802251940
- dataset_size: 2839449248.444



