changelinglab/fleurs24-lid
收藏Hugging Face2026-01-26 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/changelinglab/fleurs24-lid
下载链接
链接失效反馈官方服务:
资源简介:
---
license:
- cc-by-4.0
arxiv: 2601.14046
dataset_info:
features:
- name: id
dtype: int32
- name: num_samples
dtype: int32
- name: path
dtype: string
- name: audio
dtype:
audio:
decode: false
- name: transcription
dtype: string
- name: raw_transcription
dtype: string
- name: gender
dtype:
class_label:
names:
'0': male
'1': female
'2': other
- name: lang_id
dtype:
class_label:
names:
'0': af_za
'1': am_et
'2': ar_eg
'3': as_in
'4': ast_es
'5': az_az
'6': be_by
'7': bg_bg
'8': bn_in
'9': bs_ba
'10': ca_es
'11': ceb_ph
'12': ckb_iq
'13': cmn_hans_cn
'14': cs_cz
'15': cy_gb
'16': da_dk
'17': de_de
'18': el_gr
'19': en_us
'20': es_419
'21': et_ee
'22': fa_ir
'23': ff_sn
'24': fi_fi
'25': fil_ph
'26': fr_fr
'27': ga_ie
'28': gl_es
'29': gu_in
'30': ha_ng
'31': he_il
'32': hi_in
'33': hr_hr
'34': hu_hu
'35': hy_am
'36': id_id
'37': ig_ng
'38': is_is
'39': it_it
'40': ja_jp
'41': jv_id
'42': ka_ge
'43': kam_ke
'44': kea_cv
'45': kk_kz
'46': km_kh
'47': kn_in
'48': ko_kr
'49': ky_kg
'50': lb_lu
'51': lg_ug
'52': ln_cd
'53': lo_la
'54': lt_lt
'55': luo_ke
'56': lv_lv
'57': mi_nz
'58': mk_mk
'59': ml_in
'60': mn_mn
'61': mr_in
'62': ms_my
'63': mt_mt
'64': my_mm
'65': nb_no
'66': ne_np
'67': nl_nl
'68': nso_za
'69': ny_mw
'70': oc_fr
'71': om_et
'72': or_in
'73': pa_in
'74': pl_pl
'75': ps_af
'76': pt_br
'77': ro_ro
'78': ru_ru
'79': sd_in
'80': sk_sk
'81': sl_si
'82': sn_zw
'83': so_so
'84': sr_rs
'85': sv_se
'86': sw_ke
'87': ta_in
'88': te_in
'89': tg_tj
'90': th_th
'91': tr_tr
'92': uk_ua
'93': umb_ao
'94': ur_pk
'95': uz_uz
'96': vi_vn
'97': wo_sn
'98': xh_za
'99': yo_ng
'100': yue_hant_hk
'101': zu_za
'102': all
- name: language
dtype: string
- name: lang_group_id
dtype:
class_label:
names:
'0': western_european_we
'1': eastern_european_ee
'2': central_asia_middle_north_african_cmn
'3': sub_saharan_african_ssa
'4': south_asian_sa
'5': south_east_asian_sea
'6': chinese_japanase_korean_cjk
- name: target
dtype: int64
splits:
- name: train
num_bytes: 2293092966
num_examples: 4800
- name: validation
num_bytes: 1057606132
num_examples: 2400
- name: test
num_bytes: 2228672821
num_examples: 4800
download_size: 5142533844
dataset_size: 5579371919
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
---
# FLEURS
## License
All datasets are licensed under the CC BY 4.0.
## Citation
You can access the FLEURS paper at https://arxiv.org/abs/2205.12446.
Please cite the paper when referencing the FLEURS corpus as:
```
@article{fleurs2022arxiv,
title = {FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech},
author = {Conneau, Alexis and Ma, Min and Khanuja, Simran and Zhang, Yu and Axelrod, Vera and Dalmia, Siddharth and Riesa, Jason and Rivera, Clara and Bapna, Ankur},
journal={arXiv preprint arXiv:2205.12446},
url = {https://arxiv.org/abs/2205.12446},
year = {2022},
```
You can use this dataset with our benchmarking toolkit at https://github.com/changelinglab/prism
```
@misc{prism2026,
title={PRiSM: Benchmarking Phone Realization in Speech Models},
author={Shikhar Bharadwaj and Chin-Jou Li and Yoonjae Kim and Kwanghee Choi and Eunjung Yeo and Ryan Soh-Eun Shim and Hanyu Zhou and Brendon Boldt and Karen Rosero Jacome and Kalvin Chang and Darsh Agrawal and Keer Xu and Chao-Han Huck Yang and Jian Zhu and Shinji Watanabe and David R. Mortensen},
year={2026},
eprint={2601.14046},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2601.14046},
}
```
提供机构:
changelinglab



