five

amuseix/fleurs_full_text

收藏
Hugging Face2024-05-14 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/amuseix/fleurs_full_text
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含两个配置:bg_bg和edited。每个配置都包含音频、转录、性别、语言ID、语言和语言组ID等特征。数据集被分为训练集、验证集和测试集,并提供了每个分割的字节数和示例数。

该数据集包含两个配置:bg_bg和edited。每个配置都包含音频、转录、性别、语言ID、语言和语言组ID等特征。数据集被分为训练集、验证集和测试集,并提供了每个分割的字节数和示例数。
提供机构:
amuseix
原始信息汇总

数据集概述

配置名称:bg_bg

特征信息:

  • id: int32
  • num_samples: int32
  • path: string
  • audio:
    • sampling_rate: 16000
  • transcription: string
  • raw_transcription: string
  • gender:
    • 0: male
    • 1: female
    • 2: other
  • lang_id:
    • 0: af_za
    • 1: am_et
    • 2: ar_eg
    • 3: as_in
    • 4: ast_es
    • 5: az_az
    • 6: be_by
    • 7: bg_bg
    • 8: bn_in
    • 9: bs_ba
    • 10: ca_es
    • 11: ceb_ph
    • 12: ckb_iq
    • 13: cmn_hans_cn
    • 14: cs_cz
    • 15: cy_gb
    • 16: da_dk
    • 17: de_de
    • 18: el_gr
    • 19: en_us
    • 20: es_419
    • 21: et_ee
    • 22: fa_ir
    • 23: ff_sn
    • 24: fi_fi
    • 25: fil_ph
    • 26: fr_fr
    • 27: ga_ie
    • 28: gl_es
    • 29: gu_in
    • 30: ha_ng
    • 31: he_il
    • 32: hi_in
    • 33: hr_hr
    • 34: hu_hu
    • 35: hy_am
    • 36: id_id
    • 37: ig_ng
    • 38: is_is
    • 39: it_it
    • 40: ja_jp
    • 41: jv_id
    • 42: ka_ge
    • 43: kam_ke
    • 44: kea_cv
    • 45: kk_kz
    • 46: km_kh
    • 47: kn_in
    • 48: ko_kr
    • 49: ky_kg
    • 50: lb_lu
    • 51: lg_ug
    • 52: ln_cd
    • 53: lo_la
    • 54: lt_lt
    • 55: luo_ke
    • 56: lv_lv
    • 57: mi_nz
    • 58: mk_mk
    • 59: ml_in
    • 60: mn_mn
    • 61: mr_in
    • 62: ms_my
    • 63: mt_mt
    • 64: my_mm
    • 65: nb_no
    • 66: ne_np
    • 67: nl_nl
    • 68: nso_za
    • 69: ny_mw
    • 70: oc_fr
    • 71: om_et
    • 72: or_in
    • 73: pa_in
    • 74: pl_pl
    • 75: ps_af
    • 76: pt_br
    • 77: ro_ro
    • 78: ru_ru
    • 79: sd_in
    • 80: sk_sk
    • 81: sl_si
    • 82: sn_zw
    • 83: so_so
    • 84: sr_rs
    • 85: sv_se
    • 86: sw_ke
    • 87: ta_in
    • 88: te_in
    • 89: tg_tj
    • 90: th_th
    • 91: tr_tr
    • 92: uk_ua
    • 93: umb_ao
    • 94: ur_pk
    • 95: uz_uz
    • 96: vi_vn
    • 97: wo_sn
    • 98: xh_za
    • 99: yo_ng
    • 100: yue_hant_hk
    • 101: zu_za
    • 102: all
  • language: string
  • lang_group_id:
    • 0: western_european_we
    • 1: eastern_european_ee
    • 2: central_asia_middle_north_african_cmn
    • 3: sub_saharan_african_ssa
    • 4: south_asian_sa
    • 5: south_east_asian_sea
    • 6: chinese_japanase_korean_cjk

数据分割信息:

  • train:
    • num_bytes: 2191261618.977
    • num_examples: 2973
  • validation:
    • num_bytes: 243647311.0
    • num_examples: 395
  • test:
    • num_bytes: 428466577.0
    • num_examples: 658

数据集大小:

  • download_size: 2827772466
  • dataset_size: 2863375506.977

配置名称:edited

特征信息:

  • id: int32
  • num_samples: int32
  • path: string
  • audio:
    • sampling_rate: 16000
  • transcription: string
  • raw_transcription: string
  • gender:
    • 0: male
    • 1: female
    • 2: other
  • lang_id:
    • 0: af_za
    • 1: am_et
    • 2: ar_eg
    • 3: as_in
    • 4: ast_es
    • 5: az_az
    • 6: be_by
    • 7: bg_bg
    • 8: bn_in
    • 9: bs_ba
    • 10: ca_es
    • 11: ceb_ph
    • 12: ckb_iq
    • 13: cmn_hans_cn
    • 14: cs_cz
    • 15: cy_gb
    • 16: da_dk
    • 17: de_de
    • 18: el_gr
    • 19: en_us
    • 20: es_419
    • 21: et_ee
    • 22: fa_ir
    • 23: ff_sn
    • 24: fi_fi
    • 25: fil_ph
    • 26: fr_fr
    • 27: ga_ie
    • 28: gl_es
    • 29: gu_in
    • 30: ha_ng
    • 31: he_il
    • 32: hi_in
    • 33: hr_hr
    • 34: hu_hu
    • 35: hy_am
    • 36: id_id
    • 37: ig_ng
    • 38: is_is
    • 39: it_it
    • 40: ja_jp
    • 41: jv_id
    • 42: ka_ge
    • 43: kam_ke
    • 44: kea_cv
    • 45: kk_kz
    • 46: km_kh
    • 47: kn_in
    • 48: ko_kr
    • 49: ky_kg
    • 50: lb_lu
    • 51: lg_ug
    • 52: ln_cd
    • 53: lo_la
    • 54: lt_lt
    • 55: luo_ke
    • 56: lv_lv
    • 57: mi_nz
    • 58: mk_mk
    • 59: ml_in
    • 60: mn_mn
    • 61: mr_in
    • 62: ms_my
    • 63: mt_mt
    • 64: my_mm
    • 65: nb_no
    • 66: ne_np
    • 67: nl_nl
    • 68: nso_za
    • 69: ny_mw
    • 70: oc_fr
    • 71: om_et
    • 72: or_in
    • 73: pa_in
    • 74: pl_pl
    • 75: ps_af
    • 76: pt_br
    • 77: ro_ro
    • 78: ru_ru
    • 79: sd_in
    • 80: sk_sk
    • 81: sl_si
    • 82: sn_zw
    • 83: so_so
    • 84: sr_rs
    • 85: sv_se
    • 86: sw_ke
    • 87: ta_in
    • 88: te_in
    • 89: tg_tj
    • 90: th_th
    • 91: tr_tr
    • 92: uk_ua
    • 93: umb_ao
    • 94: ur_pk
    • 95: uz_uz
    • 96: vi_vn
    • 97: wo_sn
    • 98: xh_za
    • 99: yo_ng
    • 100: yue_hant_hk
    • 101: zu_za
    • 102: all
  • language: string
  • lang_group_id:
    • 0: western_european_we
    • 1: eastern_european_ee
    • 2: central_asia_middle_north_african_cmn
    • 3: sub_saharan_african_ssa
    • 4: south_asian_sa
    • 5: south_east_asian_sea
    • 6: chinese_japanase_korean_cjk

数据分割信息:

  • train:
    • num_bytes: 2167335360.444
    • num_examples: 2973
  • validation:
    • num_bytes: 243647311.0
    • num_examples: 395
  • test:
    • num_bytes: 428466577.0
    • num_examples: 658

数据集大小:

  • download_size: 2802251940
  • dataset_size: 2839449248.444
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作