kargaranamir/glotlid-wordlists-backup
收藏Hugging Face2026-03-12 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/kargaranamir/glotlid-wordlists-backup
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
configs:
- config_name: aai_Latn
data_files:
- split: train
path: data-0.95/aai_Latn.txt
- config_name: aak_Latn
data_files:
- split: train
path: data-0.95/aak_Latn.txt
- config_name: aau_Latn
data_files:
- split: train
path: data-0.95/aau_Latn.txt
- config_name: aaz_Latn
data_files:
- split: train
path: data-0.95/aaz_Latn.txt
- config_name: aba_Latn
data_files:
- split: train
path: data-0.95/aba_Latn.txt
- config_name: abi_Latn
data_files:
- split: train
path: data-0.95/abi_Latn.txt
- config_name: abk_Cyrl
data_files:
- split: train
path: data-0.95/abk_Cyrl.txt
- config_name: abn_Latn
data_files:
- split: train
path: data-0.95/abn_Latn.txt
- config_name: abq_Cyrl
data_files:
- split: train
path: data-0.95/abq_Cyrl.txt
- config_name: abs_Latn
data_files:
- split: train
path: data-0.95/abs_Latn.txt
- config_name: abt_Latn
data_files:
- split: train
path: data-0.95/abt_Latn.txt
- config_name: abx_Latn
data_files:
- split: train
path: data-0.95/abx_Latn.txt
- config_name: aby_Latn
data_files:
- split: train
path: data-0.95/aby_Latn.txt
- config_name: abz_Latn
data_files:
- split: train
path: data-0.95/abz_Latn.txt
- config_name: aca_Latn
data_files:
- split: train
path: data-0.95/aca_Latn.txt
- config_name: acd_Latn
data_files:
- split: train
path: data-0.95/acd_Latn.txt
- config_name: ace_Arab
data_files:
- split: train
path: data-0.95/ace_Arab.txt
- config_name: ace_Latn
data_files:
- split: train
path: data-0.95/ace_Latn.txt
- config_name: acf_Latn
data_files:
- split: train
path: data-0.95/acf_Latn.txt
- config_name: ach_Latn
data_files:
- split: train
path: data-0.95/ach_Latn.txt
- config_name: acm_Arab
data_files:
- split: train
path: data-0.95/acm_Arab.txt
- config_name: acn_Latn
data_files:
- split: train
path: data-0.95/acn_Latn.txt
- config_name: acr_Latn
data_files:
- split: train
path: data-0.95/acr_Latn.txt
- config_name: acu_Latn
data_files:
- split: train
path: data-0.95/acu_Latn.txt
- config_name: ada_Latn
data_files:
- split: train
path: data-0.95/ada_Latn.txt
- config_name: ade_Latn
data_files:
- split: train
path: data-0.95/ade_Latn.txt
- config_name: adh_Latn
data_files:
- split: train
path: data-0.95/adh_Latn.txt
- config_name: adi_Latn
data_files:
- split: train
path: data-0.95/adi_Latn.txt
- config_name: adj_Latn
data_files:
- split: train
path: data-0.95/adj_Latn.txt
- config_name: adl_Latn
data_files:
- split: train
path: data-0.95/adl_Latn.txt
- config_name: ady_Cyrl
data_files:
- split: train
path: data-0.95/ady_Cyrl.txt
- config_name: adz_Latn
data_files:
- split: train
path: data-0.95/adz_Latn.txt
- config_name: aeb_Arab
data_files:
- split: train
path: data-0.95/aeb_Arab.txt
- config_name: aer_Latn
data_files:
- split: train
path: data-0.95/aer_Latn.txt
- config_name: aeu_Latn
data_files:
- split: train
path: data-0.95/aeu_Latn.txt
- config_name: aey_Latn
data_files:
- split: train
path: data-0.95/aey_Latn.txt
- config_name: afr_Latn
data_files:
- split: train
path: data-0.95/afr_Latn.txt
- config_name: agd_Latn
data_files:
- split: train
path: data-0.95/agd_Latn.txt
- config_name: agg_Latn
data_files:
- split: train
path: data-0.95/agg_Latn.txt
- config_name: agm_Latn
data_files:
- split: train
path: data-0.95/agm_Latn.txt
- config_name: agn_Latn
data_files:
- split: train
path: data-0.95/agn_Latn.txt
- config_name: agr_Latn
data_files:
- split: train
path: data-0.95/agr_Latn.txt
- config_name: agt_Latn
data_files:
- split: train
path: data-0.95/agt_Latn.txt
- config_name: agu_Latn
data_files:
- split: train
path: data-0.95/agu_Latn.txt
- config_name: agw_Latn
data_files:
- split: train
path: data-0.95/agw_Latn.txt
- config_name: agx_Cyrl
data_files:
- split: train
path: data-0.95/agx_Cyrl.txt
- config_name: aha_Latn
data_files:
- split: train
path: data-0.95/aha_Latn.txt
- config_name: ahk_Latn
data_files:
- split: train
path: data-0.95/ahk_Latn.txt
- config_name: aia_Latn
data_files:
- split: train
path: data-0.95/aia_Latn.txt
- config_name: aii_Syrc
data_files:
- split: train
path: data-0.95/aii_Syrc.txt
- config_name: aim_Latn
data_files:
- split: train
path: data-0.95/aim_Latn.txt
- config_name: ain_Latn
data_files:
- split: train
path: data-0.95/ain_Latn.txt
- config_name: ajg_Latn
data_files:
- split: train
path: data-0.95/ajg_Latn.txt
- config_name: aji_Latn
data_files:
- split: train
path: data-0.95/aji_Latn.txt
- config_name: ajz_Latn
data_files:
- split: train
path: data-0.95/ajz_Latn.txt
- config_name: akb_Latn
data_files:
- split: train
path: data-0.95/akb_Latn.txt
- config_name: ake_Latn
data_files:
- split: train
path: data-0.95/ake_Latn.txt
- config_name: akh_Latn
data_files:
- split: train
path: data-0.95/akh_Latn.txt
- config_name: akp_Latn
data_files:
- split: train
path: data-0.95/akp_Latn.txt
- config_name: ald_Latn
data_files:
- split: train
path: data-0.95/ald_Latn.txt
- config_name: alj_Latn
data_files:
- split: train
path: data-0.95/alj_Latn.txt
- config_name: aln_Latn
data_files:
- split: train
path: data-0.95/aln_Latn.txt
- config_name: alp_Latn
data_files:
- split: train
path: data-0.95/alp_Latn.txt
- config_name: alq_Latn
data_files:
- split: train
path: data-0.95/alq_Latn.txt
- config_name: als_Latn
data_files:
- split: train
path: data-0.95/als_Latn.txt
- config_name: alt_Cyrl
data_files:
- split: train
path: data-0.95/alt_Cyrl.txt
- config_name: aly_Latn
data_files:
- split: train
path: data-0.95/aly_Latn.txt
- config_name: alz_Latn
data_files:
- split: train
path: data-0.95/alz_Latn.txt
- config_name: ame_Latn
data_files:
- split: train
path: data-0.95/ame_Latn.txt
- config_name: amf_Latn
data_files:
- split: train
path: data-0.95/amf_Latn.txt
- config_name: amh_Ethi
data_files:
- split: train
path: data-0.95/amh_Ethi.txt
- config_name: ami_Latn
data_files:
- split: train
path: data-0.95/ami_Latn.txt
- config_name: amk_Latn
data_files:
- split: train
path: data-0.95/amk_Latn.txt
- config_name: amm_Latn
data_files:
- split: train
path: data-0.95/amm_Latn.txt
- config_name: amn_Latn
data_files:
- split: train
path: data-0.95/amn_Latn.txt
- config_name: amp_Latn
data_files:
- split: train
path: data-0.95/amp_Latn.txt
- config_name: amr_Latn
data_files:
- split: train
path: data-0.95/amr_Latn.txt
- config_name: amu_Latn
data_files:
- split: train
path: data-0.95/amu_Latn.txt
- config_name: amx_Latn
data_files:
- split: train
path: data-0.95/amx_Latn.txt
- config_name: ang_Latn
data_files:
- split: train
path: data-0.95/ang_Latn.txt
- config_name: anm_Latn
data_files:
- split: train
path: data-0.95/anm_Latn.txt
- config_name: ann_Latn
data_files:
- split: train
path: data-0.95/ann_Latn.txt
- config_name: anp_Deva
data_files:
- split: train
path: data-0.95/anp_Deva.txt
- config_name: anv_Latn
data_files:
- split: train
path: data-0.95/anv_Latn.txt
- config_name: any_Latn
data_files:
- split: train
path: data-0.95/any_Latn.txt
- config_name: aoi_Latn
data_files:
- split: train
path: data-0.95/aoi_Latn.txt
- config_name: aoj_Latn
data_files:
- split: train
path: data-0.95/aoj_Latn.txt
- config_name: aom_Latn
data_files:
- split: train
path: data-0.95/aom_Latn.txt
- config_name: aoz_Latn
data_files:
- split: train
path: data-0.95/aoz_Latn.txt
- config_name: apb_Latn
data_files:
- split: train
path: data-0.95/apb_Latn.txt
- config_name: apc_Arab
data_files:
- split: train
path: data-0.95/apc_Arab.txt
- config_name: ape_Latn
data_files:
- split: train
path: data-0.95/ape_Latn.txt
- config_name: apn_Latn
data_files:
- split: train
path: data-0.95/apn_Latn.txt
- config_name: apr_Latn
data_files:
- split: train
path: data-0.95/apr_Latn.txt
- config_name: apt_Latn
data_files:
- split: train
path: data-0.95/apt_Latn.txt
- config_name: apu_Latn
data_files:
- split: train
path: data-0.95/apu_Latn.txt
- config_name: apw_Latn
data_files:
- split: train
path: data-0.95/apw_Latn.txt
- config_name: apy_Latn
data_files:
- split: train
path: data-0.95/apy_Latn.txt
- config_name: apz_Latn
data_files:
- split: train
path: data-0.95/apz_Latn.txt
- config_name: aqz_Latn
data_files:
- split: train
path: data-0.95/aqz_Latn.txt
- config_name: arb_Arab
data_files:
- split: train
path: data-0.95/arb_Arab.txt
- config_name: arb_Latn
data_files:
- split: train
path: data-0.95/arb_Latn.txt
- config_name: arc_Syrc
data_files:
- split: train
path: data-0.95/arc_Syrc.txt
- config_name: are_Latn
data_files:
- split: train
path: data-0.95/are_Latn.txt
- config_name: arg_Latn
data_files:
- split: train
path: data-0.95/arg_Latn.txt
- config_name: arl_Latn
data_files:
- split: train
path: data-0.95/arl_Latn.txt
- config_name: arn_Latn
data_files:
- split: train
path: data-0.95/arn_Latn.txt
- config_name: arp_Latn
data_files:
- split: train
path: data-0.95/arp_Latn.txt
- config_name: arq_Arab
data_files:
- split: train
path: data-0.95/arq_Arab.txt
- config_name: arr_Latn
data_files:
- split: train
path: data-0.95/arr_Latn.txt
- config_name: ars_Arab
data_files:
- split: train
path: data-0.95/ars_Arab.txt
- config_name: ary_Arab
data_files:
- split: train
path: data-0.95/ary_Arab.txt
- config_name: arz_Arab
data_files:
- split: train
path: data-0.95/arz_Arab.txt
- config_name: asg_Latn
data_files:
- split: train
path: data-0.95/asg_Latn.txt
- config_name: asm_Beng
data_files:
- split: train
path: data-0.95/asm_Beng.txt
- config_name: asm_Latn
data_files:
- split: train
path: data-0.95/asm_Latn.txt
- config_name: aso_Latn
data_files:
- split: train
path: data-0.95/aso_Latn.txt
- config_name: ast_Latn
data_files:
- split: train
path: data-0.95/ast_Latn.txt
- config_name: ata_Latn
data_files:
- split: train
path: data-0.95/ata_Latn.txt
- config_name: atb_Latn
data_files:
- split: train
path: data-0.95/atb_Latn.txt
- config_name: atd_Latn
data_files:
- split: train
path: data-0.95/atd_Latn.txt
- config_name: atg_Latn
data_files:
- split: train
path: data-0.95/atg_Latn.txt
- config_name: ati_Latn
data_files:
- split: train
path: data-0.95/ati_Latn.txt
- config_name: atj_Latn
data_files:
- split: train
path: data-0.95/atj_Latn.txt
- config_name: atq_Latn
data_files:
- split: train
path: data-0.95/atq_Latn.txt
- config_name: att_Latn
data_files:
- split: train
path: data-0.95/att_Latn.txt
- config_name: auc_Latn
data_files:
- split: train
path: data-0.95/auc_Latn.txt
- config_name: aui_Latn
data_files:
- split: train
path: data-0.95/aui_Latn.txt
- config_name: auy_Latn
data_files:
- split: train
path: data-0.95/auy_Latn.txt
- config_name: ava_Cyrl
data_files:
- split: train
path: data-0.95/ava_Cyrl.txt
- config_name: avk_Latn
data_files:
- split: train
path: data-0.95/avk_Latn.txt
- config_name: avn_Latn
data_files:
- split: train
path: data-0.95/avn_Latn.txt
- config_name: avt_Latn
data_files:
- split: train
path: data-0.95/avt_Latn.txt
- config_name: avu_Latn
data_files:
- split: train
path: data-0.95/avu_Latn.txt
- config_name: awa_Deva
data_files:
- split: train
path: data-0.95/awa_Deva.txt
- config_name: awb_Latn
data_files:
- split: train
path: data-0.95/awb_Latn.txt
- config_name: awi_Latn
data_files:
- split: train
path: data-0.95/awi_Latn.txt
- config_name: awx_Latn
data_files:
- split: train
path: data-0.95/awx_Latn.txt
- config_name: ayo_Latn
data_files:
- split: train
path: data-0.95/ayo_Latn.txt
- config_name: ayp_Arab
data_files:
- split: train
path: data-0.95/ayp_Arab.txt
- config_name: ayr_Latn
data_files:
- split: train
path: data-0.95/ayr_Latn.txt
- config_name: azb_Arab
data_files:
- split: train
path: data-0.95/azb_Arab.txt
- config_name: azg_Latn
data_files:
- split: train
path: data-0.95/azg_Latn.txt
- config_name: azj_Cyrl
data_files:
- split: train
path: data-0.95/azj_Cyrl.txt
- config_name: azj_Latn
data_files:
- split: train
path: data-0.95/azj_Latn.txt
- config_name: azz_Latn
data_files:
- split: train
path: data-0.95/azz_Latn.txt
- config_name: bak_Cyrl
data_files:
- split: train
path: data-0.95/bak_Cyrl.txt
- config_name: bal_Arab
data_files:
- split: train
path: data-0.95/bal_Arab.txt
- config_name: bam_Latn
data_files:
- split: train
path: data-0.95/bam_Latn.txt
- config_name: ban_Latn
data_files:
- split: train
path: data-0.95/ban_Latn.txt
- config_name: bao_Latn
data_files:
- split: train
path: data-0.95/bao_Latn.txt
- config_name: bar_Latn
data_files:
- split: train
path: data-0.95/bar_Latn.txt
- config_name: bas_Latn
data_files:
- split: train
path: data-0.95/bas_Latn.txt
- config_name: bav_Latn
data_files:
- split: train
path: data-0.95/bav_Latn.txt
- config_name: bba_Latn
data_files:
- split: train
path: data-0.95/bba_Latn.txt
- config_name: bbb_Latn
data_files:
- split: train
path: data-0.95/bbb_Latn.txt
- config_name: bbc_Latn
data_files:
- split: train
path: data-0.95/bbc_Latn.txt
- config_name: bbj_Latn
data_files:
- split: train
path: data-0.95/bbj_Latn.txt
- config_name: bbk_Latn
data_files:
- split: train
path: data-0.95/bbk_Latn.txt
- config_name: bbo_Latn
data_files:
- split: train
path: data-0.95/bbo_Latn.txt
- config_name: bbr_Latn
data_files:
- split: train
path: data-0.95/bbr_Latn.txt
- config_name: bch_Latn
data_files:
- split: train
path: data-0.95/bch_Latn.txt
- config_name: bci_Latn
data_files:
- split: train
path: data-0.95/bci_Latn.txt
- config_name: bcl_Latn
data_files:
- split: train
path: data-0.95/bcl_Latn.txt
- config_name: bco_Latn
data_files:
- split: train
path: data-0.95/bco_Latn.txt
- config_name: bcw_Latn
data_files:
- split: train
path: data-0.95/bcw_Latn.txt
- config_name: bdd_Latn
data_files:
- split: train
path: data-0.95/bdd_Latn.txt
- config_name: bdh_Latn
data_files:
- split: train
path: data-0.95/bdh_Latn.txt
- config_name: bdq_Latn
data_files:
- split: train
path: data-0.95/bdq_Latn.txt
- config_name: bea_Latn
data_files:
- split: train
path: data-0.95/bea_Latn.txt
- config_name: bef_Latn
data_files:
- split: train
path: data-0.95/bef_Latn.txt
- config_name: bel_Cyrl
data_files:
- split: train
path: data-0.95/bel_Cyrl.txt
- config_name: bem_Latn
data_files:
- split: train
path: data-0.95/bem_Latn.txt
- config_name: ben_Beng
data_files:
- split: train
path: data-0.95/ben_Beng.txt
- config_name: ben_Latn
data_files:
- split: train
path: data-0.95/ben_Latn.txt
- config_name: beq_Latn
data_files:
- split: train
path: data-0.95/beq_Latn.txt
- config_name: bew_Latn
data_files:
- split: train
path: data-0.95/bew_Latn.txt
- config_name: bex_Latn
data_files:
- split: train
path: data-0.95/bex_Latn.txt
- config_name: bfd_Latn
data_files:
- split: train
path: data-0.95/bfd_Latn.txt
- config_name: bfo_Latn
data_files:
- split: train
path: data-0.95/bfo_Latn.txt
- config_name: bgr_Latn
data_files:
- split: train
path: data-0.95/bgr_Latn.txt
- config_name: bgs_Latn
data_files:
- split: train
path: data-0.95/bgs_Latn.txt
- config_name: bgt_Latn
data_files:
- split: train
path: data-0.95/bgt_Latn.txt
- config_name: bgz_Latn
data_files:
- split: train
path: data-0.95/bgz_Latn.txt
- config_name: bhg_Latn
data_files:
- split: train
path: data-0.95/bhg_Latn.txt
- config_name: bhl_Latn
data_files:
- split: train
path: data-0.95/bhl_Latn.txt
- config_name: bho_Deva
data_files:
- split: train
path: data-0.95/bho_Deva.txt
- config_name: bhp_Latn
data_files:
- split: train
path: data-0.95/bhp_Latn.txt
- config_name: bhw_Latn
data_files:
- split: train
path: data-0.95/bhw_Latn.txt
- config_name: bhz_Latn
data_files:
- split: train
path: data-0.95/bhz_Latn.txt
- config_name: bib_Latn
data_files:
- split: train
path: data-0.95/bib_Latn.txt
- config_name: big_Latn
data_files:
- split: train
path: data-0.95/big_Latn.txt
- config_name: bim_Latn
data_files:
- split: train
path: data-0.95/bim_Latn.txt
- config_name: bin_Latn
data_files:
- split: train
path: data-0.95/bin_Latn.txt
- config_name: bis_Latn
data_files:
- split: train
path: data-0.95/bis_Latn.txt
- config_name: biu_Latn
data_files:
- split: train
path: data-0.95/biu_Latn.txt
- config_name: biv_Latn
data_files:
- split: train
path: data-0.95/biv_Latn.txt
- config_name: bjn_Arab
data_files:
- split: train
path: data-0.95/bjn_Arab.txt
- config_name: bjn_Latn
data_files:
- split: train
path: data-0.95/bjn_Latn.txt
- config_name: bjp_Latn
data_files:
- split: train
path: data-0.95/bjp_Latn.txt
- config_name: bjr_Latn
data_files:
- split: train
path: data-0.95/bjr_Latn.txt
- config_name: bjv_Latn
data_files:
- split: train
path: data-0.95/bjv_Latn.txt
- config_name: bkd_Latn
data_files:
- split: train
path: data-0.95/bkd_Latn.txt
- config_name: bkl_Latn
data_files:
- split: train
path: data-0.95/bkl_Latn.txt
- config_name: bkq_Latn
data_files:
- split: train
path: data-0.95/bkq_Latn.txt
- config_name: bku_Latn
data_files:
- split: train
path: data-0.95/bku_Latn.txt
- config_name: bkv_Latn
data_files:
- split: train
path: data-0.95/bkv_Latn.txt
- config_name: bla_Latn
data_files:
- split: train
path: data-0.95/bla_Latn.txt
- config_name: blh_Latn
data_files:
- split: train
path: data-0.95/blh_Latn.txt
- config_name: blk_Mymr
data_files:
- split: train
path: data-0.95/blk_Mymr.txt
- config_name: blt_Latn
data_files:
- split: train
path: data-0.95/blt_Latn.txt
- config_name: blw_Latn
data_files:
- split: train
path: data-0.95/blw_Latn.txt
- config_name: blz_Latn
data_files:
- split: train
path: data-0.95/blz_Latn.txt
- config_name: bmh_Latn
data_files:
- split: train
path: data-0.95/bmh_Latn.txt
- config_name: bmk_Latn
data_files:
- split: train
path: data-0.95/bmk_Latn.txt
- config_name: bmq_Latn
data_files:
- split: train
path: data-0.95/bmq_Latn.txt
- config_name: bmr_Latn
data_files:
- split: train
path: data-0.95/bmr_Latn.txt
- config_name: bmu_Latn
data_files:
- split: train
path: data-0.95/bmu_Latn.txt
- config_name: bmv_Latn
data_files:
- split: train
path: data-0.95/bmv_Latn.txt
- config_name: bnj_Latn
data_files:
- split: train
path: data-0.95/bnj_Latn.txt
- config_name: bno_Latn
data_files:
- split: train
path: data-0.95/bno_Latn.txt
- config_name: bnp_Latn
data_files:
- split: train
path: data-0.95/bnp_Latn.txt
- config_name: boa_Latn
data_files:
- split: train
path: data-0.95/boa_Latn.txt
- config_name: bod_Tibt
data_files:
- split: train
path: data-0.95/bod_Tibt.txt
- config_name: boj_Latn
data_files:
- split: train
path: data-0.95/boj_Latn.txt
- config_name: bom_Latn
data_files:
- split: train
path: data-0.95/bom_Latn.txt
- config_name: bon_Latn
data_files:
- split: train
path: data-0.95/bon_Latn.txt
- config_name: bor_Latn
data_files:
- split: train
path: data-0.95/bor_Latn.txt
- config_name: bov_Latn
data_files:
- split: train
path: data-0.95/bov_Latn.txt
- config_name: box_Latn
data_files:
- split: train
path: data-0.95/box_Latn.txt
- config_name: bpr_Latn
data_files:
- split: train
path: data-0.95/bpr_Latn.txt
- config_name: bps_Latn
data_files:
- split: train
path: data-0.95/bps_Latn.txt
- config_name: bpy_Beng
data_files:
- split: train
path: data-0.95/bpy_Beng.txt
- config_name: bqc_Latn
data_files:
- split: train
path: data-0.95/bqc_Latn.txt
- config_name: bqj_Latn
data_files:
- split: train
path: data-0.95/bqj_Latn.txt
- config_name: bqp_Latn
data_files:
- split: train
path: data-0.95/bqp_Latn.txt
- config_name: bre_Latn
data_files:
- split: train
path: data-0.95/bre_Latn.txt
- config_name: brh_Arab
data_files:
- split: train
path: data-0.95/brh_Arab.txt
- config_name: bru_Latn
data_files:
- split: train
path: data-0.95/bru_Latn.txt
- config_name: brx_Deva
data_files:
- split: train
path: data-0.95/brx_Deva.txt
- config_name: brx_Latn
data_files:
- split: train
path: data-0.95/brx_Latn.txt
- config_name: bsc_Latn
data_files:
- split: train
path: data-0.95/bsc_Latn.txt
- config_name: bsn_Latn
data_files:
- split: train
path: data-0.95/bsn_Latn.txt
- config_name: bsp_Latn
data_files:
- split: train
path: data-0.95/bsp_Latn.txt
- config_name: bsq_Latn
data_files:
- split: train
path: data-0.95/bsq_Latn.txt
- config_name: bss_Latn
data_files:
- split: train
path: data-0.95/bss_Latn.txt
- config_name: btd_Latn
data_files:
- split: train
path: data-0.95/btd_Latn.txt
- config_name: bth_Latn
data_files:
- split: train
path: data-0.95/bth_Latn.txt
- config_name: bts_Latn
data_files:
- split: train
path: data-0.95/bts_Latn.txt
- config_name: btt_Latn
data_files:
- split: train
path: data-0.95/btt_Latn.txt
- config_name: btx_Latn
data_files:
- split: train
path: data-0.95/btx_Latn.txt
- config_name: bud_Latn
data_files:
- split: train
path: data-0.95/bud_Latn.txt
- config_name: bug_Latn
data_files:
- split: train
path: data-0.95/bug_Latn.txt
- config_name: buk_Latn
data_files:
- split: train
path: data-0.95/buk_Latn.txt
- config_name: bul_Cyrl
data_files:
- split: train
path: data-0.95/bul_Cyrl.txt
- config_name: bum_Latn
data_files:
- split: train
path: data-0.95/bum_Latn.txt
- config_name: bus_Latn
data_files:
- split: train
path: data-0.95/bus_Latn.txt
- config_name: bvc_Latn
data_files:
- split: train
path: data-0.95/bvc_Latn.txt
- config_name: bvd_Latn
data_files:
- split: train
path: data-0.95/bvd_Latn.txt
- config_name: bvr_Latn
data_files:
- split: train
path: data-0.95/bvr_Latn.txt
- config_name: bvz_Latn
data_files:
- split: train
path: data-0.95/bvz_Latn.txt
- config_name: bwd_Latn
data_files:
- split: train
path: data-0.95/bwd_Latn.txt
- config_name: bwi_Latn
data_files:
- split: train
path: data-0.95/bwi_Latn.txt
- config_name: bwq_Latn
data_files:
- split: train
path: data-0.95/bwq_Latn.txt
- config_name: bwu_Latn
data_files:
- split: train
path: data-0.95/bwu_Latn.txt
- config_name: bxh_Latn
data_files:
- split: train
path: data-0.95/bxh_Latn.txt
- config_name: bxr_Cyrl
data_files:
- split: train
path: data-0.95/bxr_Cyrl.txt
- config_name: byr_Latn
data_files:
- split: train
path: data-0.95/byr_Latn.txt
- config_name: byv_Latn
data_files:
- split: train
path: data-0.95/byv_Latn.txt
- config_name: byx_Latn
data_files:
- split: train
path: data-0.95/byx_Latn.txt
- config_name: bzd_Latn
data_files:
- split: train
path: data-0.95/bzd_Latn.txt
- config_name: bzh_Latn
data_files:
- split: train
path: data-0.95/bzh_Latn.txt
- config_name: bzi_Thai
data_files:
- split: train
path: data-0.95/bzi_Thai.txt
- config_name: bzj_Latn
data_files:
- split: train
path: data-0.95/bzj_Latn.txt
- config_name: caa_Latn
data_files:
- split: train
path: data-0.95/caa_Latn.txt
- config_name: cab_Latn
data_files:
- split: train
path: data-0.95/cab_Latn.txt
- config_name: cac_Latn
data_files:
- split: train
path: data-0.95/cac_Latn.txt
- config_name: caf_Latn
data_files:
- split: train
path: data-0.95/caf_Latn.txt
- config_name: cag_Latn
data_files:
- split: train
path: data-0.95/cag_Latn.txt
- config_name: cak_Latn
data_files:
- split: train
path: data-0.95/cak_Latn.txt
- config_name: cao_Latn
data_files:
- split: train
path: data-0.95/cao_Latn.txt
- config_name: cap_Latn
data_files:
- split: train
path: data-0.95/cap_Latn.txt
- config_name: caq_Latn
data_files:
- split: train
path: data-0.95/caq_Latn.txt
- config_name: car_Latn
data_files:
- split: train
path: data-0.95/car_Latn.txt
- config_name: cas_Latn
data_files:
- split: train
path: data-0.95/cas_Latn.txt
- config_name: cat_Latn
data_files:
- split: train
path: data-0.95/cat_Latn.txt
- config_name: cav_Latn
data_files:
- split: train
path: data-0.95/cav_Latn.txt
- config_name: cax_Latn
data_files:
- split: train
path: data-0.95/cax_Latn.txt
- config_name: cbc_Latn
data_files:
- split: train
path: data-0.95/cbc_Latn.txt
- config_name: cbi_Latn
data_files:
- split: train
path: data-0.95/cbi_Latn.txt
- config_name: cbk_Latn
data_files:
- split: train
path: data-0.95/cbk_Latn.txt
- config_name: cbr_Latn
data_files:
- split: train
path: data-0.95/cbr_Latn.txt
- config_name: cbs_Latn
data_files:
- split: train
path: data-0.95/cbs_Latn.txt
- config_name: cbt_Latn
data_files:
- split: train
path: data-0.95/cbt_Latn.txt
- config_name: cbu_Latn
data_files:
- split: train
path: data-0.95/cbu_Latn.txt
- config_name: cbv_Latn
data_files:
- split: train
path: data-0.95/cbv_Latn.txt
- config_name: cce_Latn
data_files:
- split: train
path: data-0.95/cce_Latn.txt
- config_name: cco_Latn
data_files:
- split: train
path: data-0.95/cco_Latn.txt
- config_name: ccp_Latn
data_files:
- split: train
path: data-0.95/ccp_Latn.txt
- config_name: cdf_Latn
data_files:
- split: train
path: data-0.95/cdf_Latn.txt
- config_name: cdo_Latn
data_files:
- split: train
path: data-0.95/cdo_Latn.txt
- config_name: ceb_Latn
data_files:
- split: train
path: data-0.95/ceb_Latn.txt
- config_name: ceg_Latn
data_files:
- split: train
path: data-0.95/ceg_Latn.txt
- config_name: cek_Latn
data_files:
- split: train
path: data-0.95/cek_Latn.txt
- config_name: ces_Latn
data_files:
- split: train
path: data-0.95/ces_Latn.txt
- config_name: cfm_Latn
data_files:
- split: train
path: data-0.95/cfm_Latn.txt
- config_name: cgc_Latn
data_files:
- split: train
path: data-0.95/cgc_Latn.txt
- config_name: cgg_Latn
data_files:
- split: train
path: data-0.95/cgg_Latn.txt
- config_name: cha_Latn
data_files:
- split: train
path: data-0.95/cha_Latn.txt
- config_name: chd_Latn
data_files:
- split: train
path: data-0.95/chd_Latn.txt
- config_name: che_Cyrl
data_files:
- split: train
path: data-0.95/che_Cyrl.txt
- config_name: chf_Latn
data_files:
- split: train
path: data-0.95/chf_Latn.txt
- config_name: chj_Latn
data_files:
- split: train
path: data-0.95/chj_Latn.txt
- config_name: chk_Latn
data_files:
- split: train
path: data-0.95/chk_Latn.txt
- config_name: cho_Latn
data_files:
- split: train
path: data-0.95/cho_Latn.txt
- config_name: chq_Latn
data_files:
- split: train
path: data-0.95/chq_Latn.txt
- config_name: chr_Cher
data_files:
- split: train
path: data-0.95/chr_Cher.txt
- config_name: chr_Latn
data_files:
- split: train
path: data-0.95/chr_Latn.txt
- config_name: chu_Cyrl
data_files:
- split: train
path: data-0.95/chu_Cyrl.txt
- config_name: chv_Cyrl
data_files:
- split: train
path: data-0.95/chv_Cyrl.txt
- config_name: chw_Latn
data_files:
- split: train
path: data-0.95/chw_Latn.txt
- config_name: chz_Latn
data_files:
- split: train
path: data-0.95/chz_Latn.txt
- config_name: cjk_Latn
data_files:
- split: train
path: data-0.95/cjk_Latn.txt
- config_name: cjo_Latn
data_files:
- split: train
path: data-0.95/cjo_Latn.txt
- config_name: cjp_Latn
data_files:
- split: train
path: data-0.95/cjp_Latn.txt
- config_name: cjs_Cyrl
data_files:
- split: train
path: data-0.95/cjs_Cyrl.txt
- config_name: cjv_Latn
data_files:
- split: train
path: data-0.95/cjv_Latn.txt
- config_name: ckb_Arab
data_files:
- split: train
path: data-0.95/ckb_Arab.txt
- config_name: cko_Latn
data_files:
- split: train
path: data-0.95/cko_Latn.txt
- config_name: ckt_Cyrl
data_files:
- split: train
path: data-0.95/ckt_Cyrl.txt
- config_name: cle_Latn
data_files:
- split: train
path: data-0.95/cle_Latn.txt
- config_name: clu_Latn
data_files:
- split: train
path: data-0.95/clu_Latn.txt
- config_name: cly_Latn
data_files:
- split: train
path: data-0.95/cly_Latn.txt
- config_name: cme_Latn
data_files:
- split: train
path: data-0.95/cme_Latn.txt
- config_name: cmn_Hani
data_files:
- split: train
path: data-0.95/cmn_Hani.txt
- config_name: cmo_Khmr
data_files:
- split: train
path: data-0.95/cmo_Khmr.txt
- config_name: cmo_Latn
data_files:
- split: train
path: data-0.95/cmo_Latn.txt
- config_name: cmr_Latn
data_files:
- split: train
path: data-0.95/cmr_Latn.txt
- config_name: cnh_Latn
data_files:
- split: train
path: data-0.95/cnh_Latn.txt
- config_name: cni_Latn
data_files:
- split: train
path: data-0.95/cni_Latn.txt
- config_name: cnk_Latn
data_files:
- split: train
path: data-0.95/cnk_Latn.txt
- config_name: cnl_Latn
data_files:
- split: train
path: data-0.95/cnl_Latn.txt
- config_name: cnt_Latn
data_files:
- split: train
path: data-0.95/cnt_Latn.txt
- config_name: cnw_Latn
data_files:
- split: train
path: data-0.95/cnw_Latn.txt
- config_name: coe_Latn
data_files:
- split: train
path: data-0.95/coe_Latn.txt
- config_name: cof_Latn
data_files:
- split: train
path: data-0.95/cof_Latn.txt
- config_name: cok_Latn
data_files:
- split: train
path: data-0.95/cok_Latn.txt
- config_name: con_Latn
data_files:
- split: train
path: data-0.95/con_Latn.txt
- config_name: cop_Copt
data_files:
- split: train
path: data-0.95/cop_Copt.txt
- config_name: cor_Latn
data_files:
- split: train
path: data-0.95/cor_Latn.txt
- config_name: cos_Latn
data_files:
- split: train
path: data-0.95/cos_Latn.txt
- config_name: cot_Latn
data_files:
- split: train
path: data-0.95/cot_Latn.txt
- config_name: cou_Latn
data_files:
- split: train
path: data-0.95/cou_Latn.txt
- config_name: cpa_Latn
data_files:
- split: train
path: data-0.95/cpa_Latn.txt
- config_name: cpb_Latn
data_files:
- split: train
path: data-0.95/cpb_Latn.txt
- config_name: cpc_Latn
data_files:
- split: train
path: data-0.95/cpc_Latn.txt
- config_name: cpu_Latn
data_files:
- split: train
path: data-0.95/cpu_Latn.txt
- config_name: cpy_Latn
data_files:
- split: train
path: data-0.95/cpy_Latn.txt
- config_name: crh_Cyrl
data_files:
- split: train
path: data-0.95/crh_Cyrl.txt
- config_name: crh_Latn
data_files:
- split: train
path: data-0.95/crh_Latn.txt
- config_name: cri_Latn
data_files:
- split: train
path: data-0.95/cri_Latn.txt
- config_name: crj_Cans
data_files:
- split: train
path: data-0.95/crj_Cans.txt
- config_name: crk_Cans
data_files:
- split: train
path: data-0.95/crk_Cans.txt
- config_name: crk_Latn
data_files:
- split: train
path: data-0.95/crk_Latn.txt
- config_name: crl_Cans
data_files:
- split: train
path: data-0.95/crl_Cans.txt
- config_name: crm_Cans
data_files:
- split: train
path: data-0.95/crm_Cans.txt
- config_name: crn_Latn
data_files:
- split: train
path: data-0.95/crn_Latn.txt
- config_name: crs_Latn
data_files:
- split: train
path: data-0.95/crs_Latn.txt
- config_name: crt_Latn
data_files:
- split: train
path: data-0.95/crt_Latn.txt
- config_name: crx_Latn
data_files:
- split: train
path: data-0.95/crx_Latn.txt
- config_name: csb_Latn
data_files:
- split: train
path: data-0.95/csb_Latn.txt
- config_name: csk_Latn
data_files:
- split: train
path: data-0.95/csk_Latn.txt
- config_name: cso_Latn
data_files:
- split: train
path: data-0.95/cso_Latn.txt
- config_name: csw_Cans
data_files:
- split: train
path: data-0.95/csw_Cans.txt
- config_name: csw_Latn
data_files:
- split: train
path: data-0.95/csw_Latn.txt
- config_name: csy_Latn
data_files:
- split: train
path: data-0.95/csy_Latn.txt
- config_name: cta_Latn
data_files:
- split: train
path: data-0.95/cta_Latn.txt
- config_name: ctd_Latn
data_files:
- split: train
path: data-0.95/ctd_Latn.txt
- config_name: cto_Latn
data_files:
- split: train
path: data-0.95/cto_Latn.txt
- config_name: ctp_Latn
data_files:
- split: train
path: data-0.95/ctp_Latn.txt
- config_name: ctu_Latn
data_files:
- split: train
path: data-0.95/ctu_Latn.txt
- config_name: cub_Latn
data_files:
- split: train
path: data-0.95/cub_Latn.txt
- config_name: cuc_Latn
data_files:
- split: train
path: data-0.95/cuc_Latn.txt
- config_name: cui_Latn
data_files:
- split: train
path: data-0.95/cui_Latn.txt
- config_name: cuk_Latn
data_files:
- split: train
path: data-0.95/cuk_Latn.txt
- config_name: cul_Latn
data_files:
- split: train
path: data-0.95/cul_Latn.txt
- config_name: cut_Latn
data_files:
- split: train
path: data-0.95/cut_Latn.txt
- config_name: cux_Latn
data_files:
- split: train
path: data-0.95/cux_Latn.txt
- config_name: cwd_Cans
data_files:
- split: train
path: data-0.95/cwd_Cans.txt
- config_name: cwe_Latn
data_files:
- split: train
path: data-0.95/cwe_Latn.txt
- config_name: cwt_Latn
data_files:
- split: train
path: data-0.95/cwt_Latn.txt
- config_name: cya_Latn
data_files:
- split: train
path: data-0.95/cya_Latn.txt
- config_name: cym_Latn
data_files:
- split: train
path: data-0.95/cym_Latn.txt
- config_name: czt_Latn
data_files:
- split: train
path: data-0.95/czt_Latn.txt
- config_name: daa_Latn
data_files:
- split: train
path: data-0.95/daa_Latn.txt
- config_name: dad_Latn
data_files:
- split: train
path: data-0.95/dad_Latn.txt
- config_name: daf_Latn
data_files:
- split: train
path: data-0.95/daf_Latn.txt
- config_name: dag_Latn
data_files:
- split: train
path: data-0.95/dag_Latn.txt
- config_name: dah_Latn
data_files:
- split: train
path: data-0.95/dah_Latn.txt
- config_name: dak_Latn
data_files:
- split: train
path: data-0.95/dak_Latn.txt
- config_name: dan_Latn
data_files:
- split: train
path: data-0.95/dan_Latn.txt
- config_name: dar_Cyrl
data_files:
- split: train
path: data-0.95/dar_Cyrl.txt
- config_name: dbq_Latn
data_files:
- split: train
path: data-0.95/dbq_Latn.txt
- config_name: ddg_Latn
data_files:
- split: train
path: data-0.95/ddg_Latn.txt
- config_name: ddn_Latn
data_files:
- split: train
path: data-0.95/ddn_Latn.txt
- config_name: ded_Latn
data_files:
- split: train
path: data-0.95/ded_Latn.txt
- config_name: des_Latn
data_files:
- split: train
path: data-0.95/des_Latn.txt
- config_name: deu_Latn
data_files:
- split: train
path: data-0.95/deu_Latn.txt
- config_name: dga_Latn
data_files:
- split: train
path: data-0.95/dga_Latn.txt
- config_name: dgc_Latn
data_files:
- split: train
path: data-0.95/dgc_Latn.txt
- config_name: dgi_Latn
data_files:
- split: train
path: data-0.95/dgi_Latn.txt
- config_name: dgr_Latn
data_files:
- split: train
path: data-0.95/dgr_Latn.txt
- config_name: dgz_Latn
data_files:
- split: train
path: data-0.95/dgz_Latn.txt
- config_name: dhg_Latn
data_files:
- split: train
path: data-0.95/dhg_Latn.txt
- config_name: dhm_Latn
data_files:
- split: train
path: data-0.95/dhm_Latn.txt
- config_name: dhv_Latn
data_files:
- split: train
path: data-0.95/dhv_Latn.txt
- config_name: did_Latn
data_files:
- split: train
path: data-0.95/did_Latn.txt
- config_name: dig_Latn
data_files:
- split: train
path: data-0.95/dig_Latn.txt
- config_name: dik_Latn
data_files:
- split: train
path: data-0.95/dik_Latn.txt
- config_name: dip_Latn
data_files:
- split: train
path: data-0.95/dip_Latn.txt
- config_name: diq_Latn
data_files:
- split: train
path: data-0.95/diq_Latn.txt
- config_name: dis_Latn
data_files:
- split: train
path: data-0.95/dis_Latn.txt
- config_name: diu_Latn
data_files:
- split: train
path: data-0.95/diu_Latn.txt
- config_name: div_Thaa
data_files:
- split: train
path: data-0.95/div_Thaa.txt
- config_name: dje_Latn
data_files:
- split: train
path: data-0.95/dje_Latn.txt
- config_name: djk_Latn
data_files:
- split: train
path: data-0.95/djk_Latn.txt
- config_name: djr_Latn
data_files:
- split: train
path: data-0.95/djr_Latn.txt
- config_name: dks_Latn
data_files:
- split: train
path: data-0.95/dks_Latn.txt
- config_name: dln_Latn
data_files:
- split: train
path: data-0.95/dln_Latn.txt
- config_name: dng_Cyrl
data_files:
- split: train
path: data-0.95/dng_Cyrl.txt
- config_name: dnj_Latn
data_files:
- split: train
path: data-0.95/dnj_Latn.txt
- config_name: dnw_Latn
data_files:
- split: train
path: data-0.95/dnw_Latn.txt
- config_name: dob_Latn
data_files:
- split: train
path: data-0.95/dob_Latn.txt
- config_name: doi_Deva
data_files:
- split: train
path: data-0.95/doi_Deva.txt
- config_name: dop_Latn
data_files:
- split: train
path: data-0.95/dop_Latn.txt
- config_name: dos_Latn
data_files:
- split: train
path: data-0.95/dos_Latn.txt
- config_name: dow_Latn
data_files:
- split: train
path: data-0.95/dow_Latn.txt
- config_name: drg_Latn
data_files:
- split: train
path: data-0.95/drg_Latn.txt
- config_name: dru_Latn
data_files:
- split: train
path: data-0.95/dru_Latn.txt
- config_name: dsb_Latn
data_files:
- split: train
path: data-0.95/dsb_Latn.txt
- config_name: dsh_Latn
data_files:
- split: train
path: data-0.95/dsh_Latn.txt
- config_name: dtb_Latn
data_files:
- split: train
path: data-0.95/dtb_Latn.txt
- config_name: dtp_Latn
data_files:
- split: train
path: data-0.95/dtp_Latn.txt
- config_name: dts_Latn
data_files:
- split: train
path: data-0.95/dts_Latn.txt
- config_name: dty_Deva
data_files:
- split: train
path: data-0.95/dty_Deva.txt
- config_name: dua_Latn
data_files:
- split: train
path: data-0.95/dua_Latn.txt
- config_name: due_Latn
data_files:
- split: train
path: data-0.95/due_Latn.txt
- config_name: dug_Latn
data_files:
- split: train
path: data-0.95/dug_Latn.txt
- config_name: duo_Latn
data_files:
- split: train
path: data-0.95/duo_Latn.txt
- config_name: dur_Latn
data_files:
- split: train
path: data-0.95/dur_Latn.txt
- config_name: dwr_Ethi
data_files:
- split: train
path: data-0.95/dwr_Ethi.txt
- config_name: dwr_Latn
data_files:
- split: train
path: data-0.95/dwr_Latn.txt
- config_name: dww_Latn
data_files:
- split: train
path: data-0.95/dww_Latn.txt
- config_name: dyi_Latn
data_files:
- split: train
path: data-0.95/dyi_Latn.txt
- config_name: dyo_Latn
data_files:
- split: train
path: data-0.95/dyo_Latn.txt
- config_name: dyu_Latn
data_files:
- split: train
path: data-0.95/dyu_Latn.txt
- config_name: dzo_Tibt
data_files:
- split: train
path: data-0.95/dzo_Tibt.txt
- config_name: ebk_Latn
data_files:
- split: train
path: data-0.95/ebk_Latn.txt
- config_name: efi_Latn
data_files:
- split: train
path: data-0.95/efi_Latn.txt
- config_name: eka_Latn
data_files:
- split: train
path: data-0.95/eka_Latn.txt
- config_name: ekk_Latn
data_files:
- split: train
path: data-0.95/ekk_Latn.txt
- config_name: eko_Latn
data_files:
- split: train
path: data-0.95/eko_Latn.txt
- config_name: ell_Grek
data_files:
- split: train
path: data-0.95/ell_Grek.txt
- config_name: eme_Latn
data_files:
- split: train
path: data-0.95/eme_Latn.txt
- config_name: emi_Latn
data_files:
- split: train
path: data-0.95/emi_Latn.txt
- config_name: eml_Latn
data_files:
- split: train
path: data-0.95/eml_Latn.txt
- config_name: emp_Latn
data_files:
- split: train
path: data-0.95/emp_Latn.txt
- config_name: enb_Latn
data_files:
- split: train
path: data-0.95/enb_Latn.txt
- config_name: eng_Latn
data_files:
- split: train
path: data-0.95/eng_Latn.txt
- config_name: enl_Latn
data_files:
- split: train
path: data-0.95/enl_Latn.txt
- config_name: enm_Latn
data_files:
- split: train
path: data-0.95/enm_Latn.txt
- config_name: enq_Latn
data_files:
- split: train
path: data-0.95/enq_Latn.txt
- config_name: enx_Latn
data_files:
- split: train
path: data-0.95/enx_Latn.txt
- config_name: epo_Latn
data_files:
- split: train
path: data-0.95/epo_Latn.txt
- config_name: eri_Latn
data_files:
- split: train
path: data-0.95/eri_Latn.txt
- config_name: ese_Latn
data_files:
- split: train
path: data-0.95/ese_Latn.txt
- config_name: esi_Latn
data_files:
- split: train
path: data-0.95/esi_Latn.txt
- config_name: esk_Latn
data_files:
- split: train
path: data-0.95/esk_Latn.txt
- config_name: ess_Latn
data_files:
- split: train
path: data-0.95/ess_Latn.txt
- config_name: esu_Latn
data_files:
- split: train
path: data-0.95/esu_Latn.txt
- config_name: eto_Latn
data_files:
- split: train
path: data-0.95/eto_Latn.txt
- config_name: etr_Latn
data_files:
- split: train
path: data-0.95/etr_Latn.txt
- config_name: etu_Latn
data_files:
- split: train
path: data-0.95/etu_Latn.txt
- config_name: eus_Latn
data_files:
- split: train
path: data-0.95/eus_Latn.txt
- config_name: eve_Cyrl
data_files:
- split: train
path: data-0.95/eve_Cyrl.txt
- config_name: ewe_Latn
data_files:
- split: train
path: data-0.95/ewe_Latn.txt
- config_name: ewo_Latn
data_files:
- split: train
path: data-0.95/ewo_Latn.txt
- config_name: ext_Latn
data_files:
- split: train
path: data-0.95/ext_Latn.txt
- config_name: eza_Latn
data_files:
- split: train
path: data-0.95/eza_Latn.txt
- config_name: faa_Latn
data_files:
- split: train
path: data-0.95/faa_Latn.txt
- config_name: fad_Latn
data_files:
- split: train
path: data-0.95/fad_Latn.txt
- config_name: fai_Latn
data_files:
- split: train
path: data-0.95/fai_Latn.txt
- config_name: fal_Latn
data_files:
- split: train
path: data-0.95/fal_Latn.txt
- config_name: fan_Latn
data_files:
- split: train
path: data-0.95/fan_Latn.txt
- config_name: fao_Latn
data_files:
- split: train
path: data-0.95/fao_Latn.txt
- config_name: far_Latn
data_files:
- split: train
path: data-0.95/far_Latn.txt
- config_name: fas_Arab
data_files:
- split: train
path: data-0.95/fas_Arab.txt
- config_name: fat_Latn
data_files:
- split: train
path: data-0.95/fat_Latn.txt
- config_name: ffm_Latn
data_files:
- split: train
path: data-0.95/ffm_Latn.txt
- config_name: fij_Latn
data_files:
- split: train
path: data-0.95/fij_Latn.txt
- config_name: fil_Latn
data_files:
- split: train
path: data-0.95/fil_Latn.txt
- config_name: fin_Latn
data_files:
- split: train
path: data-0.95/fin_Latn.txt
- config_name: fit_Latn
data_files:
- split: train
path: data-0.95/fit_Latn.txt
- config_name: fkv_Latn
data_files:
- split: train
path: data-0.95/fkv_Latn.txt
- config_name: fmu_Deva
data_files:
- split: train
path: data-0.95/fmu_Deva.txt
- config_name: fon_Latn
data_files:
- split: train
path: data-0.95/fon_Latn.txt
- config_name: for_Latn
data_files:
- split: train
path: data-0.95/for_Latn.txt
- config_name: fra_Latn
data_files:
- split: train
path: data-0.95/fra_Latn.txt
- config_name: frd_Latn
data_files:
- split: train
path: data-0.95/frd_Latn.txt
- config_name: fro_Latn
data_files:
- split: train
path: data-0.95/fro_Latn.txt
- config_name: frp_Latn
data_files:
- split: train
path: data-0.95/frp_Latn.txt
- config_name: frr_Latn
data_files:
- split: train
path: data-0.95/frr_Latn.txt
- config_name: fry_Latn
data_files:
- split: train
path: data-0.95/fry_Latn.txt
- config_name: fub_Latn
data_files:
- split: train
path: data-0.95/fub_Latn.txt
- config_name: fud_Latn
data_files:
- split: train
path: data-0.95/fud_Latn.txt
- config_name: fue_Latn
data_files:
- split: train
path: data-0.95/fue_Latn.txt
- config_name: fuf_Latn
data_files:
- split: train
path: data-0.95/fuf_Latn.txt
- config_name: fuh_Latn
data_files:
- split: train
path: data-0.95/fuh_Latn.txt
- config_name: fuq_Latn
data_files:
- split: train
path: data-0.95/fuq_Latn.txt
- config_name: fur_Latn
data_files:
- split: train
path: data-0.95/fur_Latn.txt
- config_name: fuv_Arab
data_files:
- split: train
path: data-0.95/fuv_Arab.txt
- config_name: fuv_Latn
data_files:
- split: train
path: data-0.95/fuv_Latn.txt
- config_name: gaa_Latn
data_files:
- split: train
path: data-0.95/gaa_Latn.txt
- config_name: gag_Cyrl
data_files:
- split: train
path: data-0.95/gag_Cyrl.txt
- config_name: gag_Latn
data_files:
- split: train
path: data-0.95/gag_Latn.txt
- config_name: gah_Latn
data_files:
- split: train
path: data-0.95/gah_Latn.txt
- config_name: gai_Latn
data_files:
- split: train
path: data-0.95/gai_Latn.txt
- config_name: gam_Latn
data_files:
- split: train
path: data-0.95/gam_Latn.txt
- config_name: gaw_Latn
data_files:
- split: train
path: data-0.95/gaw_Latn.txt
- config_name: gaz_Latn
data_files:
- split: train
path: data-0.95/gaz_Latn.txt
- config_name: gbi_Latn
data_files:
- split: train
path: data-0.95/gbi_Latn.txt
- config_name: gbo_Latn
data_files:
- split: train
path: data-0.95/gbo_Latn.txt
- config_name: gbr_Latn
data_files:
- split: train
path: data-0.95/gbr_Latn.txt
- config_name: gcf_Latn
data_files:
- split: train
path: data-0.95/gcf_Latn.txt
- config_name: gcr_Latn
data_files:
- split: train
path: data-0.95/gcr_Latn.txt
- config_name: gde_Latn
data_files:
- split: train
path: data-0.95/gde_Latn.txt
- config_name: gdg_Latn
data_files:
- split: train
path: data-0.95/gdg_Latn.txt
- config_name: gdn_Latn
data_files:
- split: train
path: data-0.95/gdn_Latn.txt
- config_name: gdr_Latn
data_files:
- split: train
path: data-0.95/gdr_Latn.txt
- config_name: geb_Latn
data_files:
- split: train
path: data-0.95/geb_Latn.txt
- config_name: gej_Latn
data_files:
- split: train
path: data-0.95/gej_Latn.txt
- config_name: gfk_Latn
data_files:
- split: train
path: data-0.95/gfk_Latn.txt
- config_name: ghe_Deva
data_files:
- split: train
path: data-0.95/ghe_Deva.txt
- config_name: ghs_Latn
data_files:
- split: train
path: data-0.95/ghs_Latn.txt
- config_name: gid_Latn
data_files:
- split: train
path: data-0.95/gid_Latn.txt
- config_name: gil_Latn
data_files:
- split: train
path: data-0.95/gil_Latn.txt
- config_name: giz_Latn
data_files:
- split: train
path: data-0.95/giz_Latn.txt
- config_name: gjn_Latn
data_files:
- split: train
path: data-0.95/gjn_Latn.txt
- config_name: gkn_Latn
data_files:
- split: train
path: data-0.95/gkn_Latn.txt
- config_name: gla_Latn
data_files:
- split: train
path: data-0.95/gla_Latn.txt
- config_name: gle_Latn
data_files:
- split: train
path: data-0.95/gle_Latn.txt
- config_name: glg_Latn
data_files:
- split: train
path: data-0.95/glg_Latn.txt
- config_name: glk_Arab
data_files:
- split: train
path: data-0.95/glk_Arab.txt
- config_name: glv_Latn
data_files:
- split: train
path: data-0.95/glv_Latn.txt
- config_name: gmh_Latn
data_files:
- split: train
path: data-0.95/gmh_Latn.txt
- config_name: gmv_Ethi
data_files:
- split: train
path: data-0.95/gmv_Ethi.txt
- config_name: gmv_Latn
data_files:
- split: train
path: data-0.95/gmv_Latn.txt
- config_name: gna_Latn
data_files:
- split: train
path: data-0.95/gna_Latn.txt
- config_name: gnb_Latn
data_files:
- split: train
path: data-0.95/gnb_Latn.txt
- config_name: gnd_Latn
data_files:
- split: train
path: data-0.95/gnd_Latn.txt
- config_name: gng_Latn
data_files:
- split: train
path: data-0.95/gng_Latn.txt
- config_name: gnn_Latn
data_files:
- split: train
path: data-0.95/gnn_Latn.txt
- config_name: gnw_Latn
data_files:
- split: train
path: data-0.95/gnw_Latn.txt
- config_name: goa_Latn
data_files:
- split: train
path: data-0.95/goa_Latn.txt
- config_name: gof_Ethi
data_files:
- split: train
path: data-0.95/gof_Ethi.txt
- config_name: gof_Latn
data_files:
- split: train
path: data-0.95/gof_Latn.txt
- config_name: gog_Latn
data_files:
- split: train
path: data-0.95/gog_Latn.txt
- config_name: goh_Latn
data_files:
- split: train
path: data-0.95/goh_Latn.txt
- config_name: gom_Deva
data_files:
- split: train
path: data-0.95/gom_Deva.txt
- config_name: gom_Latn
data_files:
- split: train
path: data-0.95/gom_Latn.txt
- config_name: gor_Latn
data_files:
- split: train
path: data-0.95/gor_Latn.txt
- config_name: gos_Latn
data_files:
- split: train
path: data-0.95/gos_Latn.txt
- config_name: got_Goth
data_files:
- split: train
path: data-0.95/got_Goth.txt
- config_name: got_Latn
data_files:
- split: train
path: data-0.95/got_Latn.txt
- config_name: gqr_Latn
data_files:
- split: train
path: data-0.95/gqr_Latn.txt
- config_name: grc_Grek
data_files:
- split: train
path: data-0.95/grc_Grek.txt
- config_name: grt_Beng
data_files:
- split: train
path: data-0.95/grt_Beng.txt
- config_name: gso_Latn
data_files:
- split: train
path: data-0.95/gso_Latn.txt
- config_name: gsw_Latn
data_files:
- split: train
path: data-0.95/gsw_Latn.txt
- config_name: gub_Latn
data_files:
- split: train
path: data-0.95/gub_Latn.txt
- config_name: guc_Latn
data_files:
- split: train
path: data-0.95/guc_Latn.txt
- config_name: gud_Latn
data_files:
- split: train
path: data-0.95/gud_Latn.txt
- config_name: gug_Latn
data_files:
- split: train
path: data-0.95/gug_Latn.txt
- config_name: guh_Latn
data_files:
- split: train
path: data-0.95/guh_Latn.txt
- config_name: gui_Latn
data_files:
- split: train
path: data-0.95/gui_Latn.txt
- config_name: guj_Gujr
data_files:
- split: train
path: data-0.95/guj_Gujr.txt
- config_name: guj_Latn
data_files:
- split: train
path: data-0.95/guj_Latn.txt
- config_name: guk_Ethi
data_files:
- split: train
path: data-0.95/guk_Ethi.txt
- config_name: gul_Latn
data_files:
- split: train
path: data-0.95/gul_Latn.txt
- config_name: gum_Latn
data_files:
- split: train
path: data-0.95/gum_Latn.txt
- config_name: gun_Latn
data_files:
- split: train
path: data-0.95/gun_Latn.txt
- config_name: guo_Latn
data_files:
- split: train
path: data-0.95/guo_Latn.txt
- config_name: guq_Latn
data_files:
- split: train
path: data-0.95/guq_Latn.txt
- config_name: gur_Latn
data_files:
- split: train
path: data-0.95/gur_Latn.txt
- config_name: guu_Latn
data_files:
- split: train
path: data-0.95/guu_Latn.txt
- config_name: guw_Latn
data_files:
- split: train
path: data-0.95/guw_Latn.txt
- config_name: gux_Latn
data_files:
- split: train
path: data-0.95/gux_Latn.txt
- config_name: guz_Latn
data_files:
- split: train
path: data-0.95/guz_Latn.txt
- config_name: gvc_Latn
data_files:
- split: train
path: data-0.95/gvc_Latn.txt
- config_name: gvf_Latn
data_files:
- split: train
path: data-0.95/gvf_Latn.txt
- config_name: gvl_Latn
data_files:
- split: train
path: data-0.95/gvl_Latn.txt
- config_name: gvn_Latn
data_files:
- split: train
path: data-0.95/gvn_Latn.txt
- config_name: gwi_Latn
data_files:
- split: train
path: data-0.95/gwi_Latn.txt
- config_name: gwr_Latn
data_files:
- split: train
path: data-0.95/gwr_Latn.txt
- config_name: gya_Latn
data_files:
- split: train
path: data-0.95/gya_Latn.txt
- config_name: gym_Latn
data_files:
- split: train
path: data-0.95/gym_Latn.txt
- config_name: gyr_Latn
data_files:
- split: train
path: data-0.95/gyr_Latn.txt
- config_name: hac_Arab
data_files:
- split: train
path: data-0.95/hac_Arab.txt
- config_name: hae_Latn
data_files:
- split: train
path: data-0.95/hae_Latn.txt
- config_name: hag_Latn
data_files:
- split: train
path: data-0.95/hag_Latn.txt
- config_name: hak_Hani
data_files:
- split: train
path: data-0.95/hak_Hani.txt
- config_name: hak_Latn
data_files:
- split: train
path: data-0.95/hak_Latn.txt
- config_name: hat_Latn
data_files:
- split: train
path: data-0.95/hat_Latn.txt
- config_name: hau_Latn
data_files:
- split: train
path: data-0.95/hau_Latn.txt
- config_name: hav_Latn
data_files:
- split: train
path: data-0.95/hav_Latn.txt
- config_name: haw_Latn
data_files:
- split: train
path: data-0.95/haw_Latn.txt
- config_name: hay_Latn
data_files:
- split: train
path: data-0.95/hay_Latn.txt
- config_name: hbo_Hebr
data_files:
- split: train
path: data-0.95/hbo_Hebr.txt
- config_name: hbs_Latn
data_files:
- split: train
path: data-0.95/hbs_Latn.txt
- config_name: hch_Latn
data_files:
- split: train
path: data-0.95/hch_Latn.txt
- config_name: heb_Hebr
data_files:
- split: train
path: data-0.95/heb_Hebr.txt
- config_name: heg_Latn
data_files:
- split: train
path: data-0.95/heg_Latn.txt
- config_name: heh_Latn
data_files:
- split: train
path: data-0.95/heh_Latn.txt
- config_name: her_Latn
data_files:
- split: train
path: data-0.95/her_Latn.txt
- config_name: hif_Latn
data_files:
- split: train
path: data-0.95/hif_Latn.txt
- config_name: hig_Latn
data_files:
- split: train
path: data-0.95/hig_Latn.txt
- config_name: hil_Latn
data_files:
- split: train
path: data-0.95/hil_Latn.txt
- config_name: hin_Deva
data_files:
- split: train
path: data-0.95/hin_Deva.txt
- config_name: hin_Latn
data_files:
- split: train
path: data-0.95/hin_Latn.txt
- config_name: hix_Latn
data_files:
- split: train
path: data-0.95/hix_Latn.txt
- config_name: hla_Latn
data_files:
- split: train
path: data-0.95/hla_Latn.txt
- config_name: hlt_Latn
data_files:
- split: train
path: data-0.95/hlt_Latn.txt
- config_name: hmo_Latn
data_files:
- split: train
path: data-0.95/hmo_Latn.txt
- config_name: hmr_Latn
data_files:
- split: train
path: data-0.95/hmr_Latn.txt
- config_name: hne_Deva
data_files:
- split: train
path: data-0.95/hne_Deva.txt
- config_name: hnj_Latn
data_files:
- split: train
path: data-0.95/hnj_Latn.txt
- config_name: hnn_Latn
data_files:
- split: train
path: data-0.95/hnn_Latn.txt
- config_name: hns_Latn
data_files:
- split: train
path: data-0.95/hns_Latn.txt
- config_name: hoc_Latn
data_files:
- split: train
path: data-0.95/hoc_Latn.txt
- config_name: hoc_Wara
data_files:
- split: train
path: data-0.95/hoc_Wara.txt
- config_name: hop_Latn
data_files:
- split: train
path: data-0.95/hop_Latn.txt
- config_name: hot_Latn
data_files:
- split: train
path: data-0.95/hot_Latn.txt
- config_name: hra_Latn
data_files:
- split: train
path: data-0.95/hra_Latn.txt
- config_name: hrv_Latn
data_files:
- split: train
path: data-0.95/hrv_Latn.txt
- config_name: hrx_Latn
data_files:
- split: train
path: data-0.95/hrx_Latn.txt
- config_name: hsb_Latn
data_files:
- split: train
path: data-0.95/hsb_Latn.txt
- config_name: hto_Latn
data_files:
- split: train
path: data-0.95/hto_Latn.txt
- config_name: hub_Latn
data_files:
- split: train
path: data-0.95/hub_Latn.txt
- config_name: hui_Latn
data_files:
- split: train
path: data-0.95/hui_Latn.txt
- config_name: hun_Hung
data_files:
- split: train
path: data-0.95/hun_Hung.txt
- config_name: hun_Latn
data_files:
- split: train
path: data-0.95/hun_Latn.txt
- config_name: hus_Latn
data_files:
- split: train
path: data-0.95/hus_Latn.txt
- config_name: huu_Latn
data_files:
- split: train
path: data-0.95/huu_Latn.txt
- config_name: huv_Latn
data_files:
- split: train
path: data-0.95/huv_Latn.txt
- config_name: hvn_Latn
data_files:
- split: train
path: data-0.95/hvn_Latn.txt
- config_name: hwc_Latn
data_files:
- split: train
path: data-0.95/hwc_Latn.txt
- config_name: hye_Armn
data_files:
- split: train
path: data-0.95/hye_Armn.txt
- config_name: hyw_Armn
data_files:
- split: train
path: data-0.95/hyw_Armn.txt
- config_name: ian_Latn
data_files:
- split: train
path: data-0.95/ian_Latn.txt
- config_name: iba_Latn
data_files:
- split: train
path: data-0.95/iba_Latn.txt
- config_name: ibg_Latn
data_files:
- split: train
path: data-0.95/ibg_Latn.txt
- config_name: ibo_Latn
data_files:
- split: train
path: data-0.95/ibo_Latn.txt
- config_name: icr_Latn
data_files:
- split: train
path: data-0.95/icr_Latn.txt
- config_name: ido_Latn
data_files:
- split: train
path: data-0.95/ido_Latn.txt
- config_name: idu_Latn
data_files:
- split: train
path: data-0.95/idu_Latn.txt
- config_name: ifa_Latn
data_files:
- split: train
path: data-0.95/ifa_Latn.txt
- config_name: ifb_Latn
data_files:
- split: train
path: data-0.95/ifb_Latn.txt
- config_name: ife_Latn
data_files:
- split: train
path: data-0.95/ife_Latn.txt
- config_name: ifk_Latn
data_files:
- split: train
path: data-0.95/ifk_Latn.txt
- config_name: ifu_Latn
data_files:
- split: train
path: data-0.95/ifu_Latn.txt
- config_name: ify_Latn
data_files:
- split: train
path: data-0.95/ify_Latn.txt
- config_name: ige_Latn
data_files:
- split: train
path: data-0.95/ige_Latn.txt
- config_name: ign_Latn
data_files:
- split: train
path: data-0.95/ign_Latn.txt
- config_name: ike_Cans
data_files:
- split: train
path: data-0.95/ike_Cans.txt
- config_name: ikk_Latn
data_files:
- split: train
path: data-0.95/ikk_Latn.txt
- config_name: ikt_Latn
data_files:
- split: train
path: data-0.95/ikt_Latn.txt
- config_name: ikw_Latn
data_files:
- split: train
path: data-0.95/ikw_Latn.txt
- config_name: ilb_Latn
data_files:
- split: train
path: data-0.95/ilb_Latn.txt
- config_name: ile_Latn
data_files:
- split: train
path: data-0.95/ile_Latn.txt
- config_name: ilo_Latn
data_files:
- split: train
path: data-0.95/ilo_Latn.txt
- config_name: imo_Latn
data_files:
- split: train
path: data-0.95/imo_Latn.txt
- config_name: ina_Latn
data_files:
- split: train
path: data-0.95/ina_Latn.txt
- config_name: inb_Latn
data_files:
- split: train
path: data-0.95/inb_Latn.txt
- config_name: ind_Latn
data_files:
- split: train
path: data-0.95/ind_Latn.txt
- config_name: inh_Cyrl
data_files:
- split: train
path: data-0.95/inh_Cyrl.txt
- config_name: ino_Latn
data_files:
- split: train
path: data-0.95/ino_Latn.txt
- config_name: iou_Latn
data_files:
- split: train
path: data-0.95/iou_Latn.txt
- config_name: ipi_Latn
data_files:
- split: train
path: data-0.95/ipi_Latn.txt
- config_name: iqw_Latn
data_files:
- split: train
path: data-0.95/iqw_Latn.txt
- config_name: iri_Latn
data_files:
- split: train
path: data-0.95/iri_Latn.txt
- config_name: irk_Latn
data_files:
- split: train
path: data-0.95/irk_Latn.txt
- config_name: iry_Latn
data_files:
- split: train
path: data-0.95/iry_Latn.txt
- config_name: isd_Latn
data_files:
- split: train
path: data-0.95/isd_Latn.txt
- config_name: ish_Latn
data_files:
- split: train
path: data-0.95/ish_Latn.txt
- config_name: isl_Latn
data_files:
- split: train
path: data-0.95/isl_Latn.txt
- config_name: iso_Latn
data_files:
- split: train
path: data-0.95/iso_Latn.txt
- config_name: ita_Latn
data_files:
- split: train
path: data-0.95/ita_Latn.txt
- config_name: itl_Cyrl
data_files:
- split: train
path: data-0.95/itl_Cyrl.txt
- config_name: itv_Latn
data_files:
- split: train
path: data-0.95/itv_Latn.txt
- config_name: ium_Latn
data_files:
- split: train
path: data-0.95/ium_Latn.txt
- config_name: ivb_Latn
data_files:
- split: train
path: data-0.95/ivb_Latn.txt
- config_name: ivv_Latn
data_files:
- split: train
path: data-0.95/ivv_Latn.txt
- config_name: iws_Latn
data_files:
- split: train
path: data-0.95/iws_Latn.txt
- config_name: ixl_Latn
data_files:
- split: train
path: data-0.95/ixl_Latn.txt
- config_name: izr_Latn
data_files:
- split: train
path: data-0.95/izr_Latn.txt
- config_name: izz_Latn
data_files:
- split: train
path: data-0.95/izz_Latn.txt
- config_name: jaa_Latn
data_files:
- split: train
path: data-0.95/jaa_Latn.txt
- config_name: jac_Latn
data_files:
- split: train
path: data-0.95/jac_Latn.txt
- config_name: jae_Latn
data_files:
- split: train
path: data-0.95/jae_Latn.txt
- config_name: jam_Latn
data_files:
- split: train
path: data-0.95/jam_Latn.txt
- config_name: jav_Java
data_files:
- split: train
path: data-0.95/jav_Java.txt
- config_name: jav_Latn
data_files:
- split: train
path: data-0.95/jav_Latn.txt
- config_name: jbo_Latn
data_files:
- split: train
path: data-0.95/jbo_Latn.txt
- config_name: jbu_Latn
data_files:
- split: train
path: data-0.95/jbu_Latn.txt
- config_name: jic_Latn
data_files:
- split: train
path: data-0.95/jic_Latn.txt
- config_name: jiv_Latn
data_files:
- split: train
path: data-0.95/jiv_Latn.txt
- config_name: jmc_Latn
data_files:
- split: train
path: data-0.95/jmc_Latn.txt
- config_name: jpn_Jpan
data_files:
- split: train
path: data-0.95/jpn_Jpan.txt
- config_name: jra_Latn
data_files:
- split: train
path: data-0.95/jra_Latn.txt
- config_name: jun_Orya
data_files:
- split: train
path: data-0.95/jun_Orya.txt
- config_name: jvn_Latn
data_files:
- split: train
path: data-0.95/jvn_Latn.txt
- config_name: kaa_Cyrl
data_files:
- split: train
path: data-0.95/kaa_Cyrl.txt
- config_name: kaa_Latn
data_files:
- split: train
path: data-0.95/kaa_Latn.txt
- config_name: kab_Latn
data_files:
- split: train
path: data-0.95/kab_Latn.txt
- config_name: kac_Latn
data_files:
- split: train
path: data-0.95/kac_Latn.txt
- config_name: kak_Latn
data_files:
- split: train
path: data-0.95/kak_Latn.txt
- config_name: kal_Latn
data_files:
- split: train
path: data-0.95/kal_Latn.txt
- config_name: kam_Latn
data_files:
- split: train
path: data-0.95/kam_Latn.txt
- config_name: kan_Knda
data_files:
- split: train
path: data-0.95/kan_Knda.txt
- config_name: kan_Latn
data_files:
- split: train
path: data-0.95/kan_Latn.txt
- config_name: kao_Latn
data_files:
- split: train
path: data-0.95/kao_Latn.txt
- config_name: kap_Cyrl
data_files:
- split: train
path: data-0.95/kap_Cyrl.txt
- config_name: kaq_Latn
data_files:
- split: train
path: data-0.95/kaq_Latn.txt
- config_name: kas_Arab
data_files:
- split: train
path: data-0.95/kas_Arab.txt
- config_name: kas_Deva
data_files:
- split: train
path: data-0.95/kas_Deva.txt
- config_name: kas_Latn
data_files:
- split: train
path: data-0.95/kas_Latn.txt
- config_name: kat_Geor
data_files:
- split: train
path: data-0.95/kat_Geor.txt
- config_name: kaz_Cyrl
data_files:
- split: train
path: data-0.95/kaz_Cyrl.txt
- config_name: kbc_Latn
data_files:
- split: train
path: data-0.95/kbc_Latn.txt
- config_name: kbd_Cyrl
data_files:
- split: train
path: data-0.95/kbd_Cyrl.txt
- config_name: kbh_Latn
data_files:
- split: train
path: data-0.95/kbh_Latn.txt
- config_name: kbm_Latn
data_files:
- split: train
path: data-0.95/kbm_Latn.txt
- config_name: kbo_Latn
data_files:
- split: train
path: data-0.95/kbo_Latn.txt
- config_name: kbp_Latn
data_files:
- split: train
path: data-0.95/kbp_Latn.txt
- config_name: kbq_Latn
data_files:
- split: train
path: data-0.95/kbq_Latn.txt
- config_name: kbr_Latn
data_files:
- split: train
path: data-0.95/kbr_Latn.txt
- config_name: kby_Latn
data_files:
- split: train
path: data-0.95/kby_Latn.txt
- config_name: kca_Cyrl
data_files:
- split: train
path: data-0.95/kca_Cyrl.txt
- config_name: kcg_Latn
data_files:
- split: train
path: data-0.95/kcg_Latn.txt
- config_name: kck_Latn
data_files:
- split: train
path: data-0.95/kck_Latn.txt
- config_name: kdc_Latn
data_files:
- split: train
path: data-0.95/kdc_Latn.txt
- config_name: kde_Latn
data_files:
- split: train
path: data-0.95/kde_Latn.txt
- config_name: kdh_Latn
data_files:
- split: train
path: data-0.95/kdh_Latn.txt
- config_name: kdi_Latn
data_files:
- split: train
path: data-0.95/kdi_Latn.txt
- config_name: kdj_Latn
data_files:
- split: train
path: data-0.95/kdj_Latn.txt
- config_name: kdl_Latn
data_files:
- split: train
path: data-0.95/kdl_Latn.txt
- config_name: kdr_Latn
data_files:
- split: train
path: data-0.95/kdr_Latn.txt
- config_name: kea_Latn
data_files:
- split: train
path: data-0.95/kea_Latn.txt
- config_name: kei_Latn
data_files:
- split: train
path: data-0.95/kei_Latn.txt
- config_name: kek_Latn
data_files:
- split: train
path: data-0.95/kek_Latn.txt
- config_name: ken_Latn
data_files:
- split: train
path: data-0.95/ken_Latn.txt
- config_name: keo_Latn
data_files:
- split: train
path: data-0.95/keo_Latn.txt
- config_name: ker_Latn
data_files:
- split: train
path: data-0.95/ker_Latn.txt
- config_name: kew_Latn
data_files:
- split: train
path: data-0.95/kew_Latn.txt
- config_name: kex_Deva
data_files:
- split: train
path: data-0.95/kex_Deva.txt
- config_name: kez_Latn
data_files:
- split: train
path: data-0.95/kez_Latn.txt
- config_name: kff_Telu
data_files:
- split: train
path: data-0.95/kff_Telu.txt
- config_name: kgf_Latn
data_files:
- split: train
path: data-0.95/kgf_Latn.txt
- config_name: kgk_Latn
data_files:
- split: train
path: data-0.95/kgk_Latn.txt
- config_name: kgp_Latn
data_files:
- split: train
path: data-0.95/kgp_Latn.txt
- config_name: kgr_Latn
data_files:
- split: train
path: data-0.95/kgr_Latn.txt
- config_name: kha_Latn
data_files:
- split: train
path: data-0.95/kha_Latn.txt
- config_name: khk_Cyrl
data_files:
- split: train
path: data-0.95/khk_Cyrl.txt
- config_name: khm_Khmr
data_files:
- split: train
path: data-0.95/khm_Khmr.txt
- config_name: khq_Latn
data_files:
- split: train
path: data-0.95/khq_Latn.txt
- config_name: khs_Latn
data_files:
- split: train
path: data-0.95/khs_Latn.txt
- config_name: khy_Latn
data_files:
- split: train
path: data-0.95/khy_Latn.txt
- config_name: khz_Latn
data_files:
- split: train
path: data-0.95/khz_Latn.txt
- config_name: kia_Latn
data_files:
- split: train
path: data-0.95/kia_Latn.txt
- config_name: kij_Latn
data_files:
- split: train
path: data-0.95/kij_Latn.txt
- config_name: kik_Latn
data_files:
- split: train
path: data-0.95/kik_Latn.txt
- config_name: kin_Latn
data_files:
- split: train
path: data-0.95/kin_Latn.txt
- config_name: kir_Cyrl
data_files:
- split: train
path: data-0.95/kir_Cyrl.txt
- config_name: kiu_Latn
data_files:
- split: train
path: data-0.95/kiu_Latn.txt
- config_name: kix_Latn
data_files:
- split: train
path: data-0.95/kix_Latn.txt
- config_name: kjb_Latn
data_files:
- split: train
path: data-0.95/kjb_Latn.txt
- config_name: kje_Latn
data_files:
- split: train
path: data-0.95/kje_Latn.txt
- config_name: kjh_Cyrl
data_files:
- split: train
path: data-0.95/kjh_Cyrl.txt
- config_name: kjs_Latn
data_files:
- split: train
path: data-0.95/kjs_Latn.txt
- config_name: kkc_Latn
data_files:
- split: train
path: data-0.95/kkc_Latn.txt
- config_name: kki_Latn
data_files:
- split: train
path: data-0.95/kki_Latn.txt
- config_name: kkj_Latn
data_files:
- split: train
path: data-0.95/kkj_Latn.txt
- config_name: kkl_Latn
data_files:
- split: train
path: data-0.95/kkl_Latn.txt
- config_name: kle_Deva
data_files:
- split: train
path: data-0.95/kle_Deva.txt
- config_name: klt_Latn
data_files:
- split: train
path: data-0.95/klt_Latn.txt
- config_name: klv_Latn
data_files:
- split: train
path: data-0.95/klv_Latn.txt
- config_name: kma_Latn
data_files:
- split: train
path: data-0.95/kma_Latn.txt
- config_name: kmb_Latn
data_files:
- split: train
path: data-0.95/kmb_Latn.txt
- config_name: kmd_Latn
data_files:
- split: train
path: data-0.95/kmd_Latn.txt
- config_name: kmg_Latn
data_files:
- split: train
path: data-0.95/kmg_Latn.txt
- config_name: kmh_Latn
data_files:
- split: train
path: data-0.95/kmh_Latn.txt
- config_name: kmk_Latn
data_files:
- split: train
path: data-0.95/kmk_Latn.txt
- config_name: kmm_Latn
data_files:
- split: train
path: data-0.95/kmm_Latn.txt
- config_name: kmo_Latn
data_files:
- split: train
path: data-0.95/kmo_Latn.txt
- config_name: kmr_Cyrl
data_files:
- split: train
path: data-0.95/kmr_Cyrl.txt
- config_name: kmr_Latn
data_files:
- split: train
path: data-0.95/kmr_Latn.txt
- config_name: kms_Latn
data_files:
- split: train
path: data-0.95/kms_Latn.txt
- config_name: kmu_Latn
data_files:
- split: train
path: data-0.95/kmu_Latn.txt
- config_name: kmy_Latn
data_files:
- split: train
path: data-0.95/kmy_Latn.txt
- config_name: knc_Arab
data_files:
- split: train
path: data-0.95/knc_Arab.txt
- config_name: knc_Latn
data_files:
- split: train
path: data-0.95/knc_Latn.txt
- config_name: kne_Latn
data_files:
- split: train
path: data-0.95/kne_Latn.txt
- config_name: knf_Latn
data_files:
- split: train
path: data-0.95/knf_Latn.txt
- config_name: kng_Latn
data_files:
- split: train
path: data-0.95/kng_Latn.txt
- config_name: knj_Latn
data_files:
- split: train
path: data-0.95/knj_Latn.txt
- config_name: knk_Latn
data_files:
- split: train
path: data-0.95/knk_Latn.txt
- config_name: kno_Latn
data_files:
- split: train
path: data-0.95/kno_Latn.txt
- config_name: knv_Latn
data_files:
- split: train
path: data-0.95/knv_Latn.txt
- config_name: knx_Latn
data_files:
- split: train
path: data-0.95/knx_Latn.txt
- config_name: kny_Latn
data_files:
- split: train
path: data-0.95/kny_Latn.txt
- config_name: kog_Latn
data_files:
- split: train
path: data-0.95/kog_Latn.txt
- config_name: koi_Cyrl
data_files:
- split: train
path: data-0.95/koi_Cyrl.txt
- config_name: koo_Latn
data_files:
- split: train
path: data-0.95/koo_Latn.txt
- config_name: kor_Hang
data_files:
- split: train
path: data-0.95/kor_Hang.txt
- config_name: kos_Latn
data_files:
- split: train
path: data-0.95/kos_Latn.txt
- config_name: kpe_Latn
data_files:
- split: train
path: data-0.95/kpe_Latn.txt
- config_name: kpf_Latn
data_files:
- split: train
path: data-0.95/kpf_Latn.txt
- config_name: kpg_Latn
data_files:
- split: train
path: data-0.95/kpg_Latn.txt
- config_name: kpj_Latn
data_files:
- split: train
path: data-0.95/kpj_Latn.txt
- config_name: kpq_Latn
data_files:
- split: train
path: data-0.95/kpq_Latn.txt
- config_name: kpr_Latn
data_files:
- split: train
path: data-0.95/kpr_Latn.txt
- config_name: kpv_Cyrl
data_files:
- split: train
path: data-0.95/kpv_Cyrl.txt
- config_name: kpw_Latn
data_files:
- split: train
path: data-0.95/kpw_Latn.txt
- config_name: kpx_Latn
data_files:
- split: train
path: data-0.95/kpx_Latn.txt
- config_name: kpz_Latn
data_files:
- split: train
path: data-0.95/kpz_Latn.txt
- config_name: kqa_Latn
data_files:
- split: train
path: data-0.95/kqa_Latn.txt
- config_name: kqc_Latn
data_files:
- split: train
path: data-0.95/kqc_Latn.txt
- config_name: kqe_Latn
data_files:
- split: train
path: data-0.95/kqe_Latn.txt
- config_name: kqf_Latn
data_files:
- split: train
path: data-0.95/kqf_Latn.txt
- config_name: kql_Latn
data_files:
- split: train
path: data-0.95/kql_Latn.txt
- config_name: kqn_Latn
data_files:
- split: train
path: data-0.95/kqn_Latn.txt
- config_name: kqo_Latn
data_files:
- split: train
path: data-0.95/kqo_Latn.txt
- config_name: kqp_Latn
data_files:
- split: train
path: data-0.95/kqp_Latn.txt
- config_name: kqs_Latn
data_files:
- split: train
path: data-0.95/kqs_Latn.txt
- config_name: kqw_Latn
data_files:
- split: train
path: data-0.95/kqw_Latn.txt
- config_name: kqy_Ethi
data_files:
- split: train
path: data-0.95/kqy_Ethi.txt
- config_name: krc_Cyrl
data_files:
- split: train
path: data-0.95/krc_Cyrl.txt
- config_name: kri_Latn
data_files:
- split: train
path: data-0.95/kri_Latn.txt
- config_name: krj_Latn
data_files:
- split: train
path: data-0.95/krj_Latn.txt
- config_name: krl_Latn
data_files:
- split: train
path: data-0.95/krl_Latn.txt
- config_name: kru_Deva
data_files:
- split: train
path: data-0.95/kru_Deva.txt
- config_name: krx_Latn
data_files:
- split: train
path: data-0.95/krx_Latn.txt
- config_name: ksb_Latn
data_files:
- split: train
path: data-0.95/ksb_Latn.txt
- config_name: ksc_Latn
data_files:
- split: train
path: data-0.95/ksc_Latn.txt
- config_name: ksd_Latn
data_files:
- split: train
path: data-0.95/ksd_Latn.txt
- config_name: ksf_Latn
data_files:
- split: train
path: data-0.95/ksf_Latn.txt
- config_name: ksh_Latn
data_files:
- split: train
path: data-0.95/ksh_Latn.txt
- config_name: ksj_Latn
data_files:
- split: train
path: data-0.95/ksj_Latn.txt
- config_name: ksp_Latn
data_files:
- split: train
path: data-0.95/ksp_Latn.txt
- config_name: ksr_Latn
data_files:
- split: train
path: data-0.95/ksr_Latn.txt
- config_name: kss_Latn
data_files:
- split: train
path: data-0.95/kss_Latn.txt
- config_name: ksw_Mymr
data_files:
- split: train
path: data-0.95/ksw_Mymr.txt
- config_name: ktb_Ethi
data_files:
- split: train
path: data-0.95/ktb_Ethi.txt
- config_name: ktj_Latn
data_files:
- split: train
path: data-0.95/ktj_Latn.txt
- config_name: ktm_Latn
data_files:
- split: train
path: data-0.95/ktm_Latn.txt
- config_name: kto_Latn
data_files:
- split: train
path: data-0.95/kto_Latn.txt
- config_name: ktu_Latn
data_files:
- split: train
path: data-0.95/ktu_Latn.txt
- config_name: ktz_Latn
data_files:
- split: train
path: data-0.95/ktz_Latn.txt
- config_name: kua_Latn
data_files:
- split: train
path: data-0.95/kua_Latn.txt
- config_name: kub_Latn
data_files:
- split: train
path: data-0.95/kub_Latn.txt
- config_name: kud_Latn
data_files:
- split: train
path: data-0.95/kud_Latn.txt
- config_name: kue_Latn
data_files:
- split: train
path: data-0.95/kue_Latn.txt
- config_name: kuj_Latn
data_files:
- split: train
path: data-0.95/kuj_Latn.txt
- config_name: kum_Cyrl
data_files:
- split: train
path: data-0.95/kum_Cyrl.txt
- config_name: kup_Latn
data_files:
- split: train
path: data-0.95/kup_Latn.txt
- config_name: kus_Latn
data_files:
- split: train
path: data-0.95/kus_Latn.txt
- config_name: kvg_Latn
data_files:
- split: train
path: data-0.95/kvg_Latn.txt
- config_name: kvj_Latn
data_files:
- split: train
path: data-0.95/kvj_Latn.txt
- config_name: kvn_Latn
data_files:
- split: train
path: data-0.95/kvn_Latn.txt
- config_name: kwd_Latn
data_files:
- split: train
path: data-0.95/kwd_Latn.txt
- config_name: kwf_Latn
data_files:
- split: train
path: data-0.95/kwf_Latn.txt
- config_name: kwi_Latn
data_files:
- split: train
path: data-0.95/kwi_Latn.txt
- config_name: kwj_Latn
data_files:
- split: train
path: data-0.95/kwj_Latn.txt
- config_name: kwn_Latn
data_files:
- split: train
path: data-0.95/kwn_Latn.txt
- config_name: kwy_Latn
data_files:
- split: train
path: data-0.95/kwy_Latn.txt
- config_name: kxc_Ethi
data_files:
- split: train
path: data-0.95/kxc_Ethi.txt
- config_name: kxm_Thai
data_files:
- split: train
path: data-0.95/kxm_Thai.txt
- config_name: kxw_Latn
data_files:
- split: train
path: data-0.95/kxw_Latn.txt
- config_name: kyc_Latn
data_files:
- split: train
path: data-0.95/kyc_Latn.txt
- config_name: kyf_Latn
data_files:
- split: train
path: data-0.95/kyf_Latn.txt
- config_name: kyg_Latn
data_files:
- split: train
path: data-0.95/kyg_Latn.txt
- config_name: kyq_Latn
data_files:
- split: train
path: data-0.95/kyq_Latn.txt
- config_name: kyu_Kali
data_files:
- split: train
path: data-0.95/kyu_Kali.txt
- config_name: kyu_Latn
data_files:
- split: train
path: data-0.95/kyu_Latn.txt
- config_name: kyu_Mymr
data_files:
- split: train
path: data-0.95/kyu_Mymr.txt
- config_name: kyz_Latn
data_files:
- split: train
path: data-0.95/kyz_Latn.txt
- config_name: kze_Latn
data_files:
- split: train
path: data-0.95/kze_Latn.txt
- config_name: kzf_Latn
data_files:
- split: train
path: data-0.95/kzf_Latn.txt
- config_name: kzj_Latn
data_files:
- split: train
path: data-0.95/kzj_Latn.txt
- config_name: kzn_Latn
data_files:
- split: train
path: data-0.95/kzn_Latn.txt
- config_name: lac_Latn
data_files:
- split: train
path: data-0.95/lac_Latn.txt
- config_name: lad_Hebr
data_files:
- split: train
path: data-0.95/lad_Hebr.txt
- config_name: lad_Latn
data_files:
- split: train
path: data-0.95/lad_Latn.txt
- config_name: lai_Latn
data_files:
- split: train
path: data-0.95/lai_Latn.txt
- config_name: laj_Latn
data_files:
- split: train
path: data-0.95/laj_Latn.txt
- config_name: lam_Latn
data_files:
- split: train
path: data-0.95/lam_Latn.txt
- config_name: lao_Laoo
data_files:
- split: train
path: data-0.95/lao_Laoo.txt
- config_name: lap_Latn
data_files:
- split: train
path: data-0.95/lap_Latn.txt
- config_name: las_Latn
data_files:
- split: train
path: data-0.95/las_Latn.txt
- config_name: lat_Latn
data_files:
- split: train
path: data-0.95/lat_Latn.txt
- config_name: law_Latn
data_files:
- split: train
path: data-0.95/law_Latn.txt
- config_name: lbb_Latn
data_files:
- split: train
path: data-0.95/lbb_Latn.txt
- config_name: lbe_Cyrl
data_files:
- split: train
path: data-0.95/lbe_Cyrl.txt
- config_name: lbj_Tibt
data_files:
- split: train
path: data-0.95/lbj_Tibt.txt
- config_name: lbk_Latn
data_files:
- split: train
path: data-0.95/lbk_Latn.txt
- config_name: lcm_Latn
data_files:
- split: train
path: data-0.95/lcm_Latn.txt
- config_name: lcp_Thai
data_files:
- split: train
path: data-0.95/lcp_Thai.txt
- config_name: ldi_Latn
data_files:
- split: train
path: data-0.95/ldi_Latn.txt
- config_name: ldn_Latn
data_files:
- split: train
path: data-0.95/ldn_Latn.txt
- config_name: lea_Latn
data_files:
- split: train
path: data-0.95/lea_Latn.txt
- config_name: led_Latn
data_files:
- split: train
path: data-0.95/led_Latn.txt
- config_name: lee_Latn
data_files:
- split: train
path: data-0.95/lee_Latn.txt
- config_name: lef_Latn
data_files:
- split: train
path: data-0.95/lef_Latn.txt
- config_name: leh_Latn
data_files:
- split: train
path: data-0.95/leh_Latn.txt
- config_name: lem_Latn
data_files:
- split: train
path: data-0.95/lem_Latn.txt
- config_name: leu_Latn
data_files:
- split: train
path: data-0.95/leu_Latn.txt
- config_name: lew_Latn
data_files:
- split: train
path: data-0.95/lew_Latn.txt
- config_name: lex_Latn
data_files:
- split: train
path: data-0.95/lex_Latn.txt
- config_name: lez_Cyrl
data_files:
- split: train
path: data-0.95/lez_Cyrl.txt
- config_name: lfn_Cyrl
data_files:
- split: train
path: data-0.95/lfn_Cyrl.txt
- config_name: lfn_Latn
data_files:
- split: train
path: data-0.95/lfn_Latn.txt
- config_name: lgg_Latn
data_files:
- split: train
path: data-0.95/lgg_Latn.txt
- config_name: lgl_Latn
data_files:
- split: train
path: data-0.95/lgl_Latn.txt
- config_name: lgm_Latn
data_files:
- split: train
path: data-0.95/lgm_Latn.txt
- config_name: lhi_Latn
data_files:
- split: train
path: data-0.95/lhi_Latn.txt
- config_name: lhu_Latn
data_files:
- split: train
path: data-0.95/lhu_Latn.txt
- config_name: lia_Latn
data_files:
- split: train
path: data-0.95/lia_Latn.txt
- config_name: lid_Latn
data_files:
- split: train
path: data-0.95/lid_Latn.txt
- config_name: lif_Deva
data_files:
- split: train
path: data-0.95/lif_Deva.txt
- config_name: lif_Limb
data_files:
- split: train
path: data-0.95/lif_Limb.txt
- config_name: lij_Latn
data_files:
- split: train
path: data-0.95/lij_Latn.txt
- config_name: lim_Latn
data_files:
- split: train
path: data-0.95/lim_Latn.txt
- config_name: lin_Latn
data_files:
- split: train
path: data-0.95/lin_Latn.txt
- config_name: lip_Latn
data_files:
- split: train
path: data-0.95/lip_Latn.txt
- config_name: lis_Lisu
data_files:
- split: train
path: data-0.95/lis_Lisu.txt
- config_name: lit_Latn
data_files:
- split: train
path: data-0.95/lit_Latn.txt
- config_name: liv_Latn
data_files:
- split: train
path: data-0.95/liv_Latn.txt
- config_name: ljp_Latn
data_files:
- split: train
path: data-0.95/ljp_Latn.txt
- config_name: lki_Arab
data_files:
- split: train
path: data-0.95/lki_Arab.txt
- config_name: llb_Latn
data_files:
- split: train
path: data-0.95/llb_Latn.txt
- config_name: lld_Latn
data_files:
- split: train
path: data-0.95/lld_Latn.txt
- config_name: llg_Latn
data_files:
- split: train
path: data-0.95/llg_Latn.txt
- config_name: lln_Latn
data_files:
- split: train
path: data-0.95/lln_Latn.txt
- config_name: lmk_Latn
data_files:
- split: train
path: data-0.95/lmk_Latn.txt
- config_name: lmo_Latn
data_files:
- split: train
path: data-0.95/lmo_Latn.txt
- config_name: lmp_Latn
data_files:
- split: train
path: data-0.95/lmp_Latn.txt
- config_name: lnd_Latn
data_files:
- split: train
path: data-0.95/lnd_Latn.txt
- config_name: lob_Latn
data_files:
- split: train
path: data-0.95/lob_Latn.txt
- config_name: loe_Latn
data_files:
- split: train
path: data-0.95/loe_Latn.txt
- config_name: log_Latn
data_files:
- split: train
path: data-0.95/log_Latn.txt
- config_name: lok_Latn
data_files:
- split: train
path: data-0.95/lok_Latn.txt
- config_name: lol_Latn
data_files:
- split: train
path: data-0.95/lol_Latn.txt
- config_name: lom_Latn
data_files:
- split: train
path: data-0.95/lom_Latn.txt
- config_name: loq_Latn
data_files:
- split: train
path: data-0.95/loq_Latn.txt
- config_name: loz_Latn
data_files:
- split: train
path: data-0.95/loz_Latn.txt
- config_name: lrc_Arab
data_files:
- split: train
path: data-0.95/lrc_Arab.txt
- config_name: lsi_Latn
data_files:
- split: train
path: data-0.95/lsi_Latn.txt
- config_name: lsm_Latn
data_files:
- split: train
path: data-0.95/lsm_Latn.txt
- config_name: ltg_Latn
data_files:
- split: train
path: data-0.95/ltg_Latn.txt
- config_name: ltz_Latn
data_files:
- split: train
path: data-0.95/ltz_Latn.txt
- config_name: lua_Latn
data_files:
- split: train
path: data-0.95/lua_Latn.txt
- config_name: lub_Latn
data_files:
- split: train
path: data-0.95/lub_Latn.txt
- config_name: luc_Latn
data_files:
- split: train
path: data-0.95/luc_Latn.txt
- config_name: lud_Latn
data_files:
- split: train
path: data-0.95/lud_Latn.txt
- config_name: lue_Latn
data_files:
- split: train
path: data-0.95/lue_Latn.txt
- config_name: lug_Latn
data_files:
- split: train
path: data-0.95/lug_Latn.txt
- config_name: lun_Latn
data_files:
- split: train
path: data-0.95/lun_Latn.txt
- config_name: luo_Latn
data_files:
- split: train
path: data-0.95/luo_Latn.txt
- config_name: lus_Latn
data_files:
- split: train
path: data-0.95/lus_Latn.txt
- config_name: lvs_Latn
data_files:
- split: train
path: data-0.95/lvs_Latn.txt
- config_name: lwg_Latn
data_files:
- split: train
path: data-0.95/lwg_Latn.txt
- config_name: lwo_Latn
data_files:
- split: train
path: data-0.95/lwo_Latn.txt
- config_name: lww_Latn
data_files:
- split: train
path: data-0.95/lww_Latn.txt
- config_name: lzh_Hani
data_files:
- split: train
path: data-0.95/lzh_Hani.txt
- config_name: maa_Latn
data_files:
- split: train
path: data-0.95/maa_Latn.txt
- config_name: mad_Latn
data_files:
- split: train
path: data-0.95/mad_Latn.txt
- config_name: maf_Latn
data_files:
- split: train
path: data-0.95/maf_Latn.txt
- config_name: mag_Deva
data_files:
- split: train
path: data-0.95/mag_Deva.txt
- config_name: mah_Latn
data_files:
- split: train
path: data-0.95/mah_Latn.txt
- config_name: mai_Deva
data_files:
- split: train
path: data-0.95/mai_Deva.txt
- config_name: maj_Latn
data_files:
- split: train
path: data-0.95/maj_Latn.txt
- config_name: mak_Latn
data_files:
- split: train
path: data-0.95/mak_Latn.txt
- config_name: mal_Latn
data_files:
- split: train
path: data-0.95/mal_Latn.txt
- config_name: mal_Mlym
data_files:
- split: train
path: data-0.95/mal_Mlym.txt
- config_name: mam_Latn
data_files:
- split: train
path: data-0.95/mam_Latn.txt
- config_name: maq_Latn
data_files:
- split: train
path: data-0.95/maq_Latn.txt
- config_name: mar_Deva
data_files:
- split: train
path: data-0.95/mar_Deva.txt
- config_name: mar_Latn
data_files:
- split: train
path: data-0.95/mar_Latn.txt
- config_name: mas_Latn
data_files:
- split: train
path: data-0.95/mas_Latn.txt
- config_name: mau_Latn
data_files:
- split: train
path: data-0.95/mau_Latn.txt
- config_name: mav_Latn
data_files:
- split: train
path: data-0.95/mav_Latn.txt
- config_name: maw_Latn
data_files:
- split: train
path: data-0.95/maw_Latn.txt
- config_name: max_Latn
data_files:
- split: train
path: data-0.95/max_Latn.txt
- config_name: maz_Latn
data_files:
- split: train
path: data-0.95/maz_Latn.txt
- config_name: mbb_Latn
data_files:
- split: train
path: data-0.95/mbb_Latn.txt
- config_name: mbc_Latn
data_files:
- split: train
path: data-0.95/mbc_Latn.txt
- config_name: mbd_Latn
data_files:
- split: train
path: data-0.95/mbd_Latn.txt
- config_name: mbf_Latn
data_files:
- split: train
path: data-0.95/mbf_Latn.txt
- config_name: mbh_Latn
data_files:
- split: train
path: data-0.95/mbh_Latn.txt
- config_name: mbi_Latn
data_files:
- split: train
path: data-0.95/mbi_Latn.txt
- config_name: mbj_Latn
data_files:
- split: train
path: data-0.95/mbj_Latn.txt
- config_name: mbl_Latn
data_files:
- split: train
path: data-0.95/mbl_Latn.txt
- config_name: mbs_Latn
data_files:
- split: train
path: data-0.95/mbs_Latn.txt
- config_name: mbt_Latn
data_files:
- split: train
path: data-0.95/mbt_Latn.txt
- config_name: mca_Latn
data_files:
- split: train
path: data-0.95/mca_Latn.txt
- config_name: mcb_Latn
data_files:
- split: train
path: data-0.95/mcb_Latn.txt
- config_name: mcd_Latn
data_files:
- split: train
path: data-0.95/mcd_Latn.txt
- config_name: mcf_Latn
data_files:
- split: train
path: data-0.95/mcf_Latn.txt
- config_name: mck_Latn
data_files:
- split: train
path: data-0.95/mck_Latn.txt
- config_name: mcn_Latn
data_files:
- split: train
path: data-0.95/mcn_Latn.txt
- config_name: mco_Latn
data_files:
- split: train
path: data-0.95/mco_Latn.txt
- config_name: mcp_Latn
data_files:
- split: train
path: data-0.95/mcp_Latn.txt
- config_name: mcq_Latn
data_files:
- split: train
path: data-0.95/mcq_Latn.txt
- config_name: mcu_Latn
data_files:
- split: train
path: data-0.95/mcu_Latn.txt
- config_name: mda_Latn
data_files:
- split: train
path: data-0.95/mda_Latn.txt
- config_name: mdf_Cyrl
data_files:
- split: train
path: data-0.95/mdf_Cyrl.txt
- config_name: mdy_Ethi
data_files:
- split: train
path: data-0.95/mdy_Ethi.txt
- config_name: med_Latn
data_files:
- split: train
path: data-0.95/med_Latn.txt
- config_name: mee_Latn
data_files:
- split: train
path: data-0.95/mee_Latn.txt
- config_name: mej_Latn
data_files:
- split: train
path: data-0.95/mej_Latn.txt
- config_name: mek_Latn
data_files:
- split: train
path: data-0.95/mek_Latn.txt
- config_name: men_Latn
data_files:
- split: train
path: data-0.95/men_Latn.txt
- config_name: meq_Latn
data_files:
- split: train
path: data-0.95/meq_Latn.txt
- config_name: mer_Latn
data_files:
- split: train
path: data-0.95/mer_Latn.txt
- config_name: met_Latn
data_files:
- split: train
path: data-0.95/met_Latn.txt
- config_name: meu_Latn
data_files:
- split: train
path: data-0.95/meu_Latn.txt
- config_name: mev_Latn
data_files:
- split: train
path: data-0.95/mev_Latn.txt
- config_name: mfe_Latn
data_files:
- split: train
path: data-0.95/mfe_Latn.txt
- config_name: mfg_Latn
data_files:
- split: train
path: data-0.95/mfg_Latn.txt
- config_name: mfh_Latn
data_files:
- split: train
path: data-0.95/mfh_Latn.txt
- config_name: mfi_Latn
data_files:
- split: train
path: data-0.95/mfi_Latn.txt
- config_name: mfk_Latn
data_files:
- split: train
path: data-0.95/mfk_Latn.txt
- config_name: mfq_Latn
data_files:
- split: train
path: data-0.95/mfq_Latn.txt
- config_name: mfy_Latn
data_files:
- split: train
path: data-0.95/mfy_Latn.txt
- config_name: mfz_Latn
data_files:
- split: train
path: data-0.95/mfz_Latn.txt
- config_name: mgc_Latn
data_files:
- split: train
path: data-0.95/mgc_Latn.txt
- config_name: mgh_Latn
data_files:
- split: train
path: data-0.95/mgh_Latn.txt
- config_name: mgm_Latn
data_files:
- split: train
path: data-0.95/mgm_Latn.txt
- config_name: mgo_Latn
data_files:
- split: train
path: data-0.95/mgo_Latn.txt
- config_name: mgr_Latn
data_files:
- split: train
path: data-0.95/mgr_Latn.txt
- config_name: mhi_Latn
data_files:
- split: train
path: data-0.95/mhi_Latn.txt
- config_name: mhl_Latn
data_files:
- split: train
path: data-0.95/mhl_Latn.txt
- config_name: mhr_Cyrl
data_files:
- split: train
path: data-0.95/mhr_Cyrl.txt
- config_name: mhw_Latn
data_files:
- split: train
path: data-0.95/mhw_Latn.txt
- config_name: mhx_Latn
data_files:
- split: train
path: data-0.95/mhx_Latn.txt
- config_name: mhy_Latn
data_files:
- split: train
path: data-0.95/mhy_Latn.txt
- config_name: mib_Latn
data_files:
- split: train
path: data-0.95/mib_Latn.txt
- config_name: mic_Latn
data_files:
- split: train
path: data-0.95/mic_Latn.txt
- config_name: mie_Latn
data_files:
- split: train
path: data-0.95/mie_Latn.txt
- config_name: mif_Latn
data_files:
- split: train
path: data-0.95/mif_Latn.txt
- config_name: mig_Latn
data_files:
- split: train
path: data-0.95/mig_Latn.txt
- config_name: mih_Latn
data_files:
- split: train
path: data-0.95/mih_Latn.txt
- config_name: mil_Latn
data_files:
- split: train
path: data-0.95/mil_Latn.txt
- config_name: mim_Latn
data_files:
- split: train
path: data-0.95/mim_Latn.txt
- config_name: min_Arab
data_files:
- split: train
path: data-0.95/min_Arab.txt
- config_name: min_Latn
data_files:
- split: train
path: data-0.95/min_Latn.txt
- config_name: mio_Latn
data_files:
- split: train
path: data-0.95/mio_Latn.txt
- config_name: mip_Latn
data_files:
- split: train
path: data-0.95/mip_Latn.txt
- config_name: miq_Latn
data_files:
- split: train
path: data-0.95/miq_Latn.txt
- config_name: mir_Latn
data_files:
- split: train
path: data-0.95/mir_Latn.txt
- config_name: mit_Latn
data_files:
- split: train
path: data-0.95/mit_Latn.txt
- config_name: miy_Latn
data_files:
- split: train
path: data-0.95/miy_Latn.txt
- config_name: miz_Latn
data_files:
- split: train
path: data-0.95/miz_Latn.txt
- config_name: mjc_Latn
data_files:
- split: train
path: data-0.95/mjc_Latn.txt
- config_name: mjw_Latn
data_files:
- split: train
path: data-0.95/mjw_Latn.txt
- config_name: mkd_Cyrl
data_files:
- split: train
path: data-0.95/mkd_Cyrl.txt
- config_name: mkl_Latn
data_files:
- split: train
path: data-0.95/mkl_Latn.txt
- config_name: mkn_Latn
data_files:
- split: train
path: data-0.95/mkn_Latn.txt
- config_name: mks_Latn
data_files:
- split: train
path: data-0.95/mks_Latn.txt
- config_name: mkz_Latn
data_files:
- split: train
path: data-0.95/mkz_Latn.txt
- config_name: mlh_Latn
data_files:
- split: train
path: data-0.95/mlh_Latn.txt
- config_name: mlp_Latn
data_files:
- split: train
path: data-0.95/mlp_Latn.txt
- config_name: mlt_Latn
data_files:
- split: train
path: data-0.95/mlt_Latn.txt
- config_name: mlu_Latn
data_files:
- split: train
path: data-0.95/mlu_Latn.txt
- config_name: mmn_Latn
data_files:
- split: train
path: data-0.95/mmn_Latn.txt
- config_name: mmo_Latn
data_files:
- split: train
path: data-0.95/mmo_Latn.txt
- config_name: mmx_Latn
data_files:
- split: train
path: data-0.95/mmx_Latn.txt
- config_name: mna_Latn
data_files:
- split: train
path: data-0.95/mna_Latn.txt
- config_name: mnb_Latn
data_files:
- split: train
path: data-0.95/mnb_Latn.txt
- config_name: mnf_Latn
data_files:
- split: train
path: data-0.95/mnf_Latn.txt
- config_name: mni_Beng
data_files:
- split: train
path: data-0.95/mni_Beng.txt
- config_name: mni_Latn
data_files:
- split: train
path: data-0.95/mni_Latn.txt
- config_name: mni_Mtei
data_files:
- split: train
path: data-0.95/mni_Mtei.txt
- config_name: mnk_Latn
data_files:
- split: train
path: data-0.95/mnk_Latn.txt
- config_name: mns_Cyrl
data_files:
- split: train
path: data-0.95/mns_Cyrl.txt
- config_name: mnw_Mymr
data_files:
- split: train
path: data-0.95/mnw_Mymr.txt
- config_name: mnx_Latn
data_files:
- split: train
path: data-0.95/mnx_Latn.txt
- config_name: mny_Latn
data_files:
- split: train
path: data-0.95/mny_Latn.txt
- config_name: moa_Latn
data_files:
- split: train
path: data-0.95/moa_Latn.txt
- config_name: moc_Latn
data_files:
- split: train
path: data-0.95/moc_Latn.txt
- config_name: mog_Latn
data_files:
- split: train
path: data-0.95/mog_Latn.txt
- config_name: moh_Latn
data_files:
- split: train
path: data-0.95/moh_Latn.txt
- config_name: mop_Latn
data_files:
- split: train
path: data-0.95/mop_Latn.txt
- config_name: mor_Latn
data_files:
- split: train
path: data-0.95/mor_Latn.txt
- config_name: mos_Latn
data_files:
- split: train
path: data-0.95/mos_Latn.txt
- config_name: mox_Latn
data_files:
- split: train
path: data-0.95/mox_Latn.txt
- config_name: mpg_Latn
data_files:
- split: train
path: data-0.95/mpg_Latn.txt
- config_name: mph_Latn
data_files:
- split: train
path: data-0.95/mph_Latn.txt
- config_name: mpm_Latn
data_files:
- split: train
path: data-0.95/mpm_Latn.txt
- config_name: mpp_Latn
data_files:
- split: train
path: data-0.95/mpp_Latn.txt
- config_name: mps_Latn
data_files:
- split: train
path: data-0.95/mps_Latn.txt
- config_name: mpt_Latn
data_files:
- split: train
path: data-0.95/mpt_Latn.txt
- config_name: mpx_Latn
data_files:
- split: train
path: data-0.95/mpx_Latn.txt
- config_name: mqb_Latn
data_files:
- split: train
path: data-0.95/mqb_Latn.txt
- config_name: mqj_Latn
data_files:
- split: train
path: data-0.95/mqj_Latn.txt
- config_name: mqy_Latn
data_files:
- split: train
path: data-0.95/mqy_Latn.txt
- config_name: mrg_Latn
data_files:
- split: train
path: data-0.95/mrg_Latn.txt
- config_name: mri_Latn
data_files:
- split: train
path: data-0.95/mri_Latn.txt
- config_name: mrj_Cyrl
data_files:
- split: train
path: data-0.95/mrj_Cyrl.txt
- config_name: mrq_Latn
data_files:
- split: train
path: data-0.95/mrq_Latn.txt
- config_name: mrv_Latn
data_files:
- split: train
path: data-0.95/mrv_Latn.txt
- config_name: mrw_Latn
data_files:
- split: train
path: data-0.95/mrw_Latn.txt
- config_name: msb_Latn
data_files:
- split: train
path: data-0.95/msb_Latn.txt
- config_name: msc_Latn
data_files:
- split: train
path: data-0.95/msc_Latn.txt
- config_name: mse_Latn
data_files:
- split: train
path: data-0.95/mse_Latn.txt
- config_name: msk_Latn
data_files:
- split: train
path: data-0.95/msk_Latn.txt
- config_name: msm_Latn
data_files:
- split: train
path: data-0.95/msm_Latn.txt
- config_name: msy_Latn
data_files:
- split: train
path: data-0.95/msy_Latn.txt
- config_name: mta_Latn
data_files:
- split: train
path: data-0.95/mta_Latn.txt
- config_name: mtg_Latn
data_files:
- split: train
path: data-0.95/mtg_Latn.txt
- config_name: mti_Latn
data_files:
- split: train
path: data-0.95/mti_Latn.txt
- config_name: mtj_Latn
data_files:
- split: train
path: data-0.95/mtj_Latn.txt
- config_name: mto_Latn
data_files:
- split: train
path: data-0.95/mto_Latn.txt
- config_name: mtp_Latn
data_files:
- split: train
path: data-0.95/mtp_Latn.txt
- config_name: mua_Latn
data_files:
- split: train
path: data-0.95/mua_Latn.txt
- config_name: mug_Latn
data_files:
- split: train
path: data-0.95/mug_Latn.txt
- config_name: muh_Latn
data_files:
- split: train
path: data-0.95/muh_Latn.txt
- config_name: mui_Latn
data_files:
- split: train
path: data-0.95/mui_Latn.txt
- config_name: mup_Deva
data_files:
- split: train
path: data-0.95/mup_Deva.txt
- config_name: mur_Latn
data_files:
- split: train
path: data-0.95/mur_Latn.txt
- config_name: mus_Latn
data_files:
- split: train
path: data-0.95/mus_Latn.txt
- config_name: mux_Latn
data_files:
- split: train
path: data-0.95/mux_Latn.txt
- config_name: muy_Latn
data_files:
- split: train
path: data-0.95/muy_Latn.txt
- config_name: mva_Latn
data_files:
- split: train
path: data-0.95/mva_Latn.txt
- config_name: mvn_Latn
data_files:
- split: train
path: data-0.95/mvn_Latn.txt
- config_name: mvp_Latn
data_files:
- split: train
path: data-0.95/mvp_Latn.txt
- config_name: mwc_Latn
data_files:
- split: train
path: data-0.95/mwc_Latn.txt
- config_name: mwf_Latn
data_files:
- split: train
path: data-0.95/mwf_Latn.txt
- config_name: mwl_Latn
data_files:
- split: train
path: data-0.95/mwl_Latn.txt
- config_name: mwm_Latn
data_files:
- split: train
path: data-0.95/mwm_Latn.txt
- config_name: mwn_Latn
data_files:
- split: train
path: data-0.95/mwn_Latn.txt
- config_name: mwp_Latn
data_files:
- split: train
path: data-0.95/mwp_Latn.txt
- config_name: mwq_Latn
data_files:
- split: train
path: data-0.95/mwq_Latn.txt
- config_name: mwv_Latn
data_files:
- split: train
path: data-0.95/mwv_Latn.txt
- config_name: mww_Latn
data_files:
- split: train
path: data-0.95/mww_Latn.txt
- config_name: mxb_Latn
data_files:
- split: train
path: data-0.95/mxb_Latn.txt
- config_name: mxp_Latn
data_files:
- split: train
path: data-0.95/mxp_Latn.txt
- config_name: mxq_Latn
data_files:
- split: train
path: data-0.95/mxq_Latn.txt
- config_name: mxt_Latn
data_files:
- split: train
path: data-0.95/mxt_Latn.txt
- config_name: mxv_Latn
data_files:
- split: train
path: data-0.95/mxv_Latn.txt
- config_name: mya_Mymr
data_files:
- split: train
path: data-0.95/mya_Mymr.txt
- config_name: myb_Latn
data_files:
- split: train
path: data-0.95/myb_Latn.txt
- config_name: myk_Latn
data_files:
- split: train
path: data-0.95/myk_Latn.txt
- config_name: myu_Latn
data_files:
- split: train
path: data-0.95/myu_Latn.txt
- config_name: myv_Cyrl
data_files:
- split: train
path: data-0.95/myv_Cyrl.txt
- config_name: myw_Latn
data_files:
- split: train
path: data-0.95/myw_Latn.txt
- config_name: myx_Latn
data_files:
- split: train
path: data-0.95/myx_Latn.txt
- config_name: myy_Latn
data_files:
- split: train
path: data-0.95/myy_Latn.txt
- config_name: mza_Latn
data_files:
- split: train
path: data-0.95/mza_Latn.txt
- config_name: mzh_Latn
data_files:
- split: train
path: data-0.95/mzh_Latn.txt
- config_name: mzk_Latn
data_files:
- split: train
path: data-0.95/mzk_Latn.txt
- config_name: mzl_Latn
data_files:
- split: train
path: data-0.95/mzl_Latn.txt
- config_name: mzm_Latn
data_files:
- split: train
path: data-0.95/mzm_Latn.txt
- config_name: mzn_Arab
data_files:
- split: train
path: data-0.95/mzn_Arab.txt
- config_name: mzw_Latn
data_files:
- split: train
path: data-0.95/mzw_Latn.txt
- config_name: mzz_Latn
data_files:
- split: train
path: data-0.95/mzz_Latn.txt
- config_name: nab_Latn
data_files:
- split: train
path: data-0.95/nab_Latn.txt
- config_name: naf_Latn
data_files:
- split: train
path: data-0.95/naf_Latn.txt
- config_name: nah_Latn
data_files:
- split: train
path: data-0.95/nah_Latn.txt
- config_name: nak_Latn
data_files:
- split: train
path: data-0.95/nak_Latn.txt
- config_name: nan_Latn
data_files:
- split: train
path: data-0.95/nan_Latn.txt
- config_name: nap_Latn
data_files:
- split: train
path: data-0.95/nap_Latn.txt
- config_name: naq_Latn
data_files:
- split: train
path: data-0.95/naq_Latn.txt
- config_name: nas_Latn
data_files:
- split: train
path: data-0.95/nas_Latn.txt
- config_name: nav_Latn
data_files:
- split: train
path: data-0.95/nav_Latn.txt
- config_name: naw_Latn
data_files:
- split: train
path: data-0.95/naw_Latn.txt
- config_name: nba_Latn
data_files:
- split: train
path: data-0.95/nba_Latn.txt
- config_name: nbc_Latn
data_files:
- split: train
path: data-0.95/nbc_Latn.txt
- config_name: nbe_Latn
data_files:
- split: train
path: data-0.95/nbe_Latn.txt
- config_name: nbl_Latn
data_files:
- split: train
path: data-0.95/nbl_Latn.txt
- config_name: nbq_Latn
data_files:
- split: train
path: data-0.95/nbq_Latn.txt
- config_name: nbu_Latn
data_files:
- split: train
path: data-0.95/nbu_Latn.txt
- config_name: nca_Latn
data_files:
- split: train
path: data-0.95/nca_Latn.txt
- config_name: nch_Latn
data_files:
- split: train
path: data-0.95/nch_Latn.txt
- config_name: ncj_Latn
data_files:
- split: train
path: data-0.95/ncj_Latn.txt
- config_name: ncl_Latn
data_files:
- split: train
path: data-0.95/ncl_Latn.txt
- config_name: ncq_Laoo
data_files:
- split: train
path: data-0.95/ncq_Laoo.txt
- config_name: nct_Latn
data_files:
- split: train
path: data-0.95/nct_Latn.txt
- config_name: ncu_Latn
data_files:
- split: train
path: data-0.95/ncu_Latn.txt
- config_name: ncx_Latn
data_files:
- split: train
path: data-0.95/ncx_Latn.txt
- config_name: ndc_Latn
data_files:
- split: train
path: data-0.95/ndc_Latn.txt
- config_name: nde_Latn
data_files:
- split: train
path: data-0.95/nde_Latn.txt
- config_name: ndh_Latn
data_files:
- split: train
path: data-0.95/ndh_Latn.txt
- config_name: ndi_Latn
data_files:
- split: train
path: data-0.95/ndi_Latn.txt
- config_name: ndj_Latn
data_files:
- split: train
path: data-0.95/ndj_Latn.txt
- config_name: ndo_Latn
data_files:
- split: train
path: data-0.95/ndo_Latn.txt
- config_name: ndp_Latn
data_files:
- split: train
path: data-0.95/ndp_Latn.txt
- config_name: nds_Latn
data_files:
- split: train
path: data-0.95/nds_Latn.txt
- config_name: ndy_Latn
data_files:
- split: train
path: data-0.95/ndy_Latn.txt
- config_name: ndz_Latn
data_files:
- split: train
path: data-0.95/ndz_Latn.txt
- config_name: neb_Latn
data_files:
- split: train
path: data-0.95/neb_Latn.txt
- config_name: new_Deva
data_files:
- split: train
path: data-0.95/new_Deva.txt
- config_name: nfa_Latn
data_files:
- split: train
path: data-0.95/nfa_Latn.txt
- config_name: nfr_Latn
data_files:
- split: train
path: data-0.95/nfr_Latn.txt
- config_name: ngb_Latn
data_files:
- split: train
path: data-0.95/ngb_Latn.txt
- config_name: ngc_Latn
data_files:
- split: train
path: data-0.95/ngc_Latn.txt
- config_name: ngl_Latn
data_files:
- split: train
path: data-0.95/ngl_Latn.txt
- config_name: ngp_Latn
data_files:
- split: train
path: data-0.95/ngp_Latn.txt
- config_name: ngu_Latn
data_files:
- split: train
path: data-0.95/ngu_Latn.txt
- config_name: nhd_Latn
data_files:
- split: train
path: data-0.95/nhd_Latn.txt
- config_name: nhe_Latn
data_files:
- split: train
path: data-0.95/nhe_Latn.txt
- config_name: nhg_Latn
data_files:
- split: train
path: data-0.95/nhg_Latn.txt
- config_name: nhi_Latn
data_files:
- split: train
path: data-0.95/nhi_Latn.txt
- config_name: nhk_Latn
data_files:
- split: train
path: data-0.95/nhk_Latn.txt
- config_name: nho_Latn
data_files:
- split: train
path: data-0.95/nho_Latn.txt
- config_name: nhr_Latn
data_files:
- split: train
path: data-0.95/nhr_Latn.txt
- config_name: nhu_Latn
data_files:
- split: train
path: data-0.95/nhu_Latn.txt
- config_name: nhw_Latn
data_files:
- split: train
path: data-0.95/nhw_Latn.txt
- config_name: nhx_Latn
data_files:
- split: train
path: data-0.95/nhx_Latn.txt
- config_name: nhy_Latn
data_files:
- split: train
path: data-0.95/nhy_Latn.txt
- config_name: nia_Latn
data_files:
- split: train
path: data-0.95/nia_Latn.txt
- config_name: nif_Latn
data_files:
- split: train
path: data-0.95/nif_Latn.txt
- config_name: nii_Latn
data_files:
- split: train
path: data-0.95/nii_Latn.txt
- config_name: nij_Latn
data_files:
- split: train
path: data-0.95/nij_Latn.txt
- config_name: nim_Latn
data_files:
- split: train
path: data-0.95/nim_Latn.txt
- config_name: nin_Latn
data_files:
- split: train
path: data-0.95/nin_Latn.txt
- config_name: nio_Cyrl
data_files:
- split: train
path: data-0.95/nio_Cyrl.txt
- config_name: niq_Latn
data_files:
- split: train
path: data-0.95/niq_Latn.txt
- config_name: niu_Latn
data_files:
- split: train
path: data-0.95/niu_Latn.txt
- config_name: niy_Latn
data_files:
- split: train
path: data-0.95/niy_Latn.txt
- config_name: njb_Latn
data_files:
- split: train
path: data-0.95/njb_Latn.txt
- config_name: njm_Latn
data_files:
- split: train
path: data-0.95/njm_Latn.txt
- config_name: njn_Latn
data_files:
- split: train
path: data-0.95/njn_Latn.txt
- config_name: njo_Latn
data_files:
- split: train
path: data-0.95/njo_Latn.txt
- config_name: njz_Latn
data_files:
- split: train
path: data-0.95/njz_Latn.txt
- config_name: nkf_Latn
data_files:
- split: train
path: data-0.95/nkf_Latn.txt
- config_name: nki_Latn
data_files:
- split: train
path: data-0.95/nki_Latn.txt
- config_name: nko_Latn
data_files:
- split: train
path: data-0.95/nko_Latn.txt
- config_name: nla_Latn
data_files:
- split: train
path: data-0.95/nla_Latn.txt
- config_name: nlc_Latn
data_files:
- split: train
path: data-0.95/nlc_Latn.txt
- config_name: nld_Latn
data_files:
- split: train
path: data-0.95/nld_Latn.txt
- config_name: nlg_Latn
data_files:
- split: train
path: data-0.95/nlg_Latn.txt
- config_name: nma_Latn
data_files:
- split: train
path: data-0.95/nma_Latn.txt
- config_name: nmf_Latn
data_files:
- split: train
path: data-0.95/nmf_Latn.txt
- config_name: nmh_Latn
data_files:
- split: train
path: data-0.95/nmh_Latn.txt
- config_name: nmo_Latn
data_files:
- split: train
path: data-0.95/nmo_Latn.txt
- config_name: nmw_Latn
data_files:
- split: train
path: data-0.95/nmw_Latn.txt
- config_name: nmz_Latn
data_files:
- split: train
path: data-0.95/nmz_Latn.txt
- config_name: nnb_Latn
data_files:
- split: train
path: data-0.95/nnb_Latn.txt
- config_name: nng_Latn
data_files:
- split: train
path: data-0.95/nng_Latn.txt
- config_name: nnh_Latn
data_files:
- split: train
path: data-0.95/nnh_Latn.txt
- config_name: nnl_Latn
data_files:
- split: train
path: data-0.95/nnl_Latn.txt
- config_name: nno_Latn
data_files:
- split: train
path: data-0.95/nno_Latn.txt
- config_name: nnp_Latn
data_files:
- split: train
path: data-0.95/nnp_Latn.txt
- config_name: nnq_Latn
data_files:
- split: train
path: data-0.95/nnq_Latn.txt
- config_name: nnw_Latn
data_files:
- split: train
path: data-0.95/nnw_Latn.txt
- config_name: noa_Latn
data_files:
- split: train
path: data-0.95/noa_Latn.txt
- config_name: nob_Latn
data_files:
- split: train
path: data-0.95/nob_Latn.txt
- config_name: nod_Thai
data_files:
- split: train
path: data-0.95/nod_Thai.txt
- config_name: nog_Cyrl
data_files:
- split: train
path: data-0.95/nog_Cyrl.txt
- config_name: non_Latn
data_files:
- split: train
path: data-0.95/non_Latn.txt
- config_name: nop_Latn
data_files:
- split: train
path: data-0.95/nop_Latn.txt
- config_name: not_Latn
data_files:
- split: train
path: data-0.95/not_Latn.txt
- config_name: nou_Latn
data_files:
- split: train
path: data-0.95/nou_Latn.txt
- config_name: nov_Latn
data_files:
- split: train
path: data-0.95/nov_Latn.txt
- config_name: nph_Latn
data_files:
- split: train
path: data-0.95/nph_Latn.txt
- config_name: npi_Deva
data_files:
- split: train
path: data-0.95/npi_Deva.txt
- config_name: npi_Latn
data_files:
- split: train
path: data-0.95/npi_Latn.txt
- config_name: npl_Latn
data_files:
- split: train
path: data-0.95/npl_Latn.txt
- config_name: npo_Latn
data_files:
- split: train
path: data-0.95/npo_Latn.txt
- config_name: npy_Latn
data_files:
- split: train
path: data-0.95/npy_Latn.txt
- config_name: nqo_Nkoo
data_files:
- split: train
path: data-0.95/nqo_Nkoo.txt
- config_name: nre_Latn
data_files:
- split: train
path: data-0.95/nre_Latn.txt
- config_name: nrf_Latn
data_files:
- split: train
path: data-0.95/nrf_Latn.txt
- config_name: nri_Latn
data_files:
- split: train
path: data-0.95/nri_Latn.txt
- config_name: nsa_Latn
data_files:
- split: train
path: data-0.95/nsa_Latn.txt
- config_name: nse_Latn
data_files:
- split: train
path: data-0.95/nse_Latn.txt
- config_name: nsm_Latn
data_files:
- split: train
path: data-0.95/nsm_Latn.txt
- config_name: nsn_Latn
data_files:
- split: train
path: data-0.95/nsn_Latn.txt
- config_name: nso_Latn
data_files:
- split: train
path: data-0.95/nso_Latn.txt
- config_name: nss_Latn
data_files:
- split: train
path: data-0.95/nss_Latn.txt
- config_name: nst_Latn
data_files:
- split: train
path: data-0.95/nst_Latn.txt
- config_name: nsu_Latn
data_files:
- split: train
path: data-0.95/nsu_Latn.txt
- config_name: ntp_Latn
data_files:
- split: train
path: data-0.95/ntp_Latn.txt
- config_name: ntr_Latn
data_files:
- split: train
path: data-0.95/ntr_Latn.txt
- config_name: ntu_Latn
data_files:
- split: train
path: data-0.95/ntu_Latn.txt
- config_name: nuj_Latn
data_files:
- split: train
path: data-0.95/nuj_Latn.txt
- config_name: nus_Latn
data_files:
- split: train
path: data-0.95/nus_Latn.txt
- config_name: nuy_Latn
data_files:
- split: train
path: data-0.95/nuy_Latn.txt
- config_name: nuz_Latn
data_files:
- split: train
path: data-0.95/nuz_Latn.txt
- config_name: nvm_Latn
data_files:
- split: train
path: data-0.95/nvm_Latn.txt
- config_name: nwb_Latn
data_files:
- split: train
path: data-0.95/nwb_Latn.txt
- config_name: nwi_Latn
data_files:
- split: train
path: data-0.95/nwi_Latn.txt
- config_name: nwx_Deva
data_files:
- split: train
path: data-0.95/nwx_Deva.txt
- config_name: nxd_Latn
data_files:
- split: train
path: data-0.95/nxd_Latn.txt
- config_name: nya_Latn
data_files:
- split: train
path: data-0.95/nya_Latn.txt
- config_name: nyf_Latn
data_files:
- split: train
path: data-0.95/nyf_Latn.txt
- config_name: nyk_Latn
data_files:
- split: train
path: data-0.95/nyk_Latn.txt
- config_name: nyn_Latn
data_files:
- split: train
path: data-0.95/nyn_Latn.txt
- config_name: nyo_Latn
data_files:
- split: train
path: data-0.95/nyo_Latn.txt
- config_name: nyu_Latn
data_files:
- split: train
path: data-0.95/nyu_Latn.txt
- config_name: nyy_Latn
data_files:
- split: train
path: data-0.95/nyy_Latn.txt
- config_name: nza_Latn
data_files:
- split: train
path: data-0.95/nza_Latn.txt
- config_name: nzi_Latn
data_files:
- split: train
path: data-0.95/nzi_Latn.txt
- config_name: nzm_Latn
data_files:
- split: train
path: data-0.95/nzm_Latn.txt
- config_name: obo_Latn
data_files:
- split: train
path: data-0.95/obo_Latn.txt
- config_name: oci_Latn
data_files:
- split: train
path: data-0.95/oci_Latn.txt
- config_name: ogo_Latn
data_files:
- split: train
path: data-0.95/ogo_Latn.txt
- config_name: ojb_Cans
data_files:
- split: train
path: data-0.95/ojb_Cans.txt
- config_name: ojb_Latn
data_files:
- split: train
path: data-0.95/ojb_Latn.txt
- config_name: oke_Latn
data_files:
- split: train
path: data-0.95/oke_Latn.txt
- config_name: oku_Latn
data_files:
- split: train
path: data-0.95/oku_Latn.txt
- config_name: okv_Latn
data_files:
- split: train
path: data-0.95/okv_Latn.txt
- config_name: old_Latn
data_files:
- split: train
path: data-0.95/old_Latn.txt
- config_name: olo_Latn
data_files:
- split: train
path: data-0.95/olo_Latn.txt
- config_name: omb_Latn
data_files:
- split: train
path: data-0.95/omb_Latn.txt
- config_name: omw_Latn
data_files:
- split: train
path: data-0.95/omw_Latn.txt
- config_name: ong_Latn
data_files:
- split: train
path: data-0.95/ong_Latn.txt
- config_name: ons_Latn
data_files:
- split: train
path: data-0.95/ons_Latn.txt
- config_name: ood_Latn
data_files:
- split: train
path: data-0.95/ood_Latn.txt
- config_name: opm_Latn
data_files:
- split: train
path: data-0.95/opm_Latn.txt
- config_name: orv_Cyrl
data_files:
- split: train
path: data-0.95/orv_Cyrl.txt
- config_name: ory_Latn
data_files:
- split: train
path: data-0.95/ory_Latn.txt
- config_name: ory_Orya
data_files:
- split: train
path: data-0.95/ory_Orya.txt
- config_name: oss_Cyrl
data_files:
- split: train
path: data-0.95/oss_Cyrl.txt
- config_name: ota_Arab
data_files:
- split: train
path: data-0.95/ota_Arab.txt
- config_name: otd_Latn
data_files:
- split: train
path: data-0.95/otd_Latn.txt
- config_name: ote_Latn
data_files:
- split: train
path: data-0.95/ote_Latn.txt
- config_name: otm_Latn
data_files:
- split: train
path: data-0.95/otm_Latn.txt
- config_name: otn_Latn
data_files:
- split: train
path: data-0.95/otn_Latn.txt
- config_name: oto_Latn
data_files:
- split: train
path: data-0.95/oto_Latn.txt
- config_name: otq_Latn
data_files:
- split: train
path: data-0.95/otq_Latn.txt
- config_name: ots_Latn
data_files:
- split: train
path: data-0.95/ots_Latn.txt
- config_name: otw_Latn
data_files:
- split: train
path: data-0.95/otw_Latn.txt
- config_name: oym_Latn
data_files:
- split: train
path: data-0.95/oym_Latn.txt
- config_name: ozm_Latn
data_files:
- split: train
path: data-0.95/ozm_Latn.txt
- config_name: pab_Latn
data_files:
- split: train
path: data-0.95/pab_Latn.txt
- config_name: pad_Latn
data_files:
- split: train
path: data-0.95/pad_Latn.txt
- config_name: pag_Latn
data_files:
- split: train
path: data-0.95/pag_Latn.txt
- config_name: pah_Latn
data_files:
- split: train
path: data-0.95/pah_Latn.txt
- config_name: pam_Latn
data_files:
- split: train
path: data-0.95/pam_Latn.txt
- config_name: pan_Guru
data_files:
- split: train
path: data-0.95/pan_Guru.txt
- config_name: pan_Latn
data_files:
- split: train
path: data-0.95/pan_Latn.txt
- config_name: pao_Latn
data_files:
- split: train
path: data-0.95/pao_Latn.txt
- config_name: pap_Latn
data_files:
- split: train
path: data-0.95/pap_Latn.txt
- config_name: pau_Latn
data_files:
- split: train
path: data-0.95/pau_Latn.txt
- config_name: pbb_Latn
data_files:
- split: train
path: data-0.95/pbb_Latn.txt
- config_name: pbc_Latn
data_files:
- split: train
path: data-0.95/pbc_Latn.txt
- config_name: pbi_Latn
data_files:
- split: train
path: data-0.95/pbi_Latn.txt
- config_name: pbt_Arab
data_files:
- split: train
path: data-0.95/pbt_Arab.txt
- config_name: pcd_Latn
data_files:
- split: train
path: data-0.95/pcd_Latn.txt
- config_name: pck_Latn
data_files:
- split: train
path: data-0.95/pck_Latn.txt
- config_name: pcm_Latn
data_files:
- split: train
path: data-0.95/pcm_Latn.txt
- config_name: pdc_Latn
data_files:
- split: train
path: data-0.95/pdc_Latn.txt
- config_name: pdt_Latn
data_files:
- split: train
path: data-0.95/pdt_Latn.txt
- config_name: pem_Latn
data_files:
- split: train
path: data-0.95/pem_Latn.txt
- config_name: pfe_Latn
data_files:
- split: train
path: data-0.95/pfe_Latn.txt
- config_name: pfl_Latn
data_files:
- split: train
path: data-0.95/pfl_Latn.txt
- config_name: phm_Latn
data_files:
- split: train
path: data-0.95/phm_Latn.txt
- config_name: pib_Latn
data_files:
- split: train
path: data-0.95/pib_Latn.txt
- config_name: pio_Latn
data_files:
- split: train
path: data-0.95/pio_Latn.txt
- config_name: pir_Latn
data_files:
- split: train
path: data-0.95/pir_Latn.txt
- config_name: pis_Latn
data_files:
- split: train
path: data-0.95/pis_Latn.txt
- config_name: pjt_Latn
data_files:
- split: train
path: data-0.95/pjt_Latn.txt
- config_name: pkb_Latn
data_files:
- split: train
path: data-0.95/pkb_Latn.txt
- config_name: plg_Latn
data_files:
- split: train
path: data-0.95/plg_Latn.txt
- config_name: pls_Latn
data_files:
- split: train
path: data-0.95/pls_Latn.txt
- config_name: plt_Latn
data_files:
- split: train
path: data-0.95/plt_Latn.txt
- config_name: plu_Latn
data_files:
- split: train
path: data-0.95/plu_Latn.txt
- config_name: plw_Latn
data_files:
- split: train
path: data-0.95/plw_Latn.txt
- config_name: pma_Latn
data_files:
- split: train
path: data-0.95/pma_Latn.txt
- config_name: pmf_Latn
data_files:
- split: train
path: data-0.95/pmf_Latn.txt
- config_name: pmq_Latn
data_files:
- split: train
path: data-0.95/pmq_Latn.txt
- config_name: pms_Latn
data_files:
- split: train
path: data-0.95/pms_Latn.txt
- config_name: pmx_Latn
data_files:
- split: train
path: data-0.95/pmx_Latn.txt
- config_name: pnb_Arab
data_files:
- split: train
path: data-0.95/pnb_Arab.txt
- config_name: pne_Latn
data_files:
- split: train
path: data-0.95/pne_Latn.txt
- config_name: pnt_Grek
data_files:
- split: train
path: data-0.95/pnt_Grek.txt
- config_name: pny_Latn
data_files:
- split: train
path: data-0.95/pny_Latn.txt
- config_name: poe_Latn
data_files:
- split: train
path: data-0.95/poe_Latn.txt
- config_name: poh_Latn
data_files:
- split: train
path: data-0.95/poh_Latn.txt
- config_name: poi_Latn
data_files:
- split: train
path: data-0.95/poi_Latn.txt
- config_name: pol_Latn
data_files:
- split: train
path: data-0.95/pol_Latn.txt
- config_name: pon_Latn
data_files:
- split: train
path: data-0.95/pon_Latn.txt
- config_name: por_Latn
data_files:
- split: train
path: data-0.95/por_Latn.txt
- config_name: pos_Latn
data_files:
- split: train
path: data-0.95/pos_Latn.txt
- config_name: pot_Latn
data_files:
- split: train
path: data-0.95/pot_Latn.txt
- config_name: pov_Latn
data_files:
- split: train
path: data-0.95/pov_Latn.txt
- config_name: poy_Latn
data_files:
- split: train
path: data-0.95/poy_Latn.txt
- config_name: ppk_Latn
data_files:
- split: train
path: data-0.95/ppk_Latn.txt
- config_name: ppo_Latn
data_files:
- split: train
path: data-0.95/ppo_Latn.txt
- config_name: pps_Latn
data_files:
- split: train
path: data-0.95/pps_Latn.txt
- config_name: prf_Latn
data_files:
- split: train
path: data-0.95/prf_Latn.txt
- config_name: prg_Latn
data_files:
- split: train
path: data-0.95/prg_Latn.txt
- config_name: pri_Latn
data_files:
- split: train
path: data-0.95/pri_Latn.txt
- config_name: prq_Latn
data_files:
- split: train
path: data-0.95/prq_Latn.txt
- config_name: pse_Latn
data_files:
- split: train
path: data-0.95/pse_Latn.txt
- config_name: pss_Latn
data_files:
- split: train
path: data-0.95/pss_Latn.txt
- config_name: ptp_Latn
data_files:
- split: train
path: data-0.95/ptp_Latn.txt
- config_name: ptu_Latn
data_files:
- split: train
path: data-0.95/ptu_Latn.txt
- config_name: pua_Latn
data_files:
- split: train
path: data-0.95/pua_Latn.txt
- config_name: pui_Latn
data_files:
- split: train
path: data-0.95/pui_Latn.txt
- config_name: pwg_Latn
data_files:
- split: train
path: data-0.95/pwg_Latn.txt
- config_name: pwn_Latn
data_files:
- split: train
path: data-0.95/pwn_Latn.txt
- config_name: pww_Thai
data_files:
- split: train
path: data-0.95/pww_Thai.txt
- config_name: pxm_Latn
data_files:
- split: train
path: data-0.95/pxm_Latn.txt
- config_name: qub_Latn
data_files:
- split: train
path: data-0.95/qub_Latn.txt
- config_name: quc_Latn
data_files:
- split: train
path: data-0.95/quc_Latn.txt
- config_name: quf_Latn
data_files:
- split: train
path: data-0.95/quf_Latn.txt
- config_name: qug_Latn
data_files:
- split: train
path: data-0.95/qug_Latn.txt
- config_name: quh_Latn
data_files:
- split: train
path: data-0.95/quh_Latn.txt
- config_name: qul_Latn
data_files:
- split: train
path: data-0.95/qul_Latn.txt
- config_name: qup_Latn
data_files:
- split: train
path: data-0.95/qup_Latn.txt
- config_name: qus_Latn
data_files:
- split: train
path: data-0.95/qus_Latn.txt
- config_name: quw_Latn
data_files:
- split: train
path: data-0.95/quw_Latn.txt
- config_name: quy_Latn
data_files:
- split: train
path: data-0.95/quy_Latn.txt
- config_name: quz_Latn
data_files:
- split: train
path: data-0.95/quz_Latn.txt
- config_name: qva_Latn
data_files:
- split: train
path: data-0.95/qva_Latn.txt
- config_name: qvc_Latn
data_files:
- split: train
path: data-0.95/qvc_Latn.txt
- config_name: qve_Latn
data_files:
- split: train
path: data-0.95/qve_Latn.txt
- config_name: qvh_Latn
data_files:
- split: train
path: data-0.95/qvh_Latn.txt
- config_name: qvi_Latn
data_files:
- split: train
path: data-0.95/qvi_Latn.txt
- config_name: qvm_Latn
data_files:
- split: train
path: data-0.95/qvm_Latn.txt
- config_name: qvn_Latn
data_files:
- split: train
path: data-0.95/qvn_Latn.txt
- config_name: qvo_Latn
data_files:
- split: train
path: data-0.95/qvo_Latn.txt
- config_name: qvs_Latn
data_files:
- split: train
path: data-0.95/qvs_Latn.txt
- config_name: qvw_Latn
data_files:
- split: train
path: data-0.95/qvw_Latn.txt
- config_name: qvz_Latn
data_files:
- split: train
path: data-0.95/qvz_Latn.txt
- config_name: qwh_Latn
data_files:
- split: train
path: data-0.95/qwh_Latn.txt
- config_name: qxh_Latn
data_files:
- split: train
path: data-0.95/qxh_Latn.txt
- config_name: qxl_Latn
data_files:
- split: train
path: data-0.95/qxl_Latn.txt
- config_name: qxn_Latn
data_files:
- split: train
path: data-0.95/qxn_Latn.txt
- config_name: qxo_Latn
data_files:
- split: train
path: data-0.95/qxo_Latn.txt
- config_name: qxr_Latn
data_files:
- split: train
path: data-0.95/qxr_Latn.txt
- config_name: rad_Latn
data_files:
- split: train
path: data-0.95/rad_Latn.txt
- config_name: rai_Latn
data_files:
- split: train
path: data-0.95/rai_Latn.txt
- config_name: ram_Latn
data_files:
- split: train
path: data-0.95/ram_Latn.txt
- config_name: rap_Latn
data_files:
- split: train
path: data-0.95/rap_Latn.txt
- config_name: rar_Latn
data_files:
- split: train
path: data-0.95/rar_Latn.txt
- config_name: rav_Deva
data_files:
- split: train
path: data-0.95/rav_Deva.txt
- config_name: raw_Latn
data_files:
- split: train
path: data-0.95/raw_Latn.txt
- config_name: rcf_Latn
data_files:
- split: train
path: data-0.95/rcf_Latn.txt
- config_name: rej_Latn
data_files:
- split: train
path: data-0.95/rej_Latn.txt
- config_name: rel_Latn
data_files:
- split: train
path: data-0.95/rel_Latn.txt
- config_name: rgu_Latn
data_files:
- split: train
path: data-0.95/rgu_Latn.txt
- config_name: rhg_Latn
data_files:
- split: train
path: data-0.95/rhg_Latn.txt
- config_name: ria_Latn
data_files:
- split: train
path: data-0.95/ria_Latn.txt
- config_name: rim_Latn
data_files:
- split: train
path: data-0.95/rim_Latn.txt
- config_name: rjs_Deva
data_files:
- split: train
path: data-0.95/rjs_Deva.txt
- config_name: rkb_Latn
data_files:
- split: train
path: data-0.95/rkb_Latn.txt
- config_name: rmc_Latn
data_files:
- split: train
path: data-0.95/rmc_Latn.txt
- config_name: rme_Latn
data_files:
- split: train
path: data-0.95/rme_Latn.txt
- config_name: rml_Latn
data_files:
- split: train
path: data-0.95/rml_Latn.txt
- config_name: rmn_Cyrl
data_files:
- split: train
path: data-0.95/rmn_Cyrl.txt
- config_name: rmn_Grek
data_files:
- split: train
path: data-0.95/rmn_Grek.txt
- config_name: rmn_Latn
data_files:
- split: train
path: data-0.95/rmn_Latn.txt
- config_name: rmo_Latn
data_files:
- split: train
path: data-0.95/rmo_Latn.txt
- config_name: rmq_Latn
data_files:
- split: train
path: data-0.95/rmq_Latn.txt
- config_name: rmy_Cyrl
data_files:
- split: train
path: data-0.95/rmy_Cyrl.txt
- config_name: rmy_Latn
data_files:
- split: train
path: data-0.95/rmy_Latn.txt
- config_name: rnd_Latn
data_files:
- split: train
path: data-0.95/rnd_Latn.txt
- config_name: rng_Latn
data_files:
- split: train
path: data-0.95/rng_Latn.txt
- config_name: rnl_Latn
data_files:
- split: train
path: data-0.95/rnl_Latn.txt
- config_name: roh_Latn
data_files:
- split: train
path: data-0.95/roh_Latn.txt
- config_name: ron_Cyrl
data_files:
- split: train
path: data-0.95/ron_Cyrl.txt
- config_name: ron_Latn
data_files:
- split: train
path: data-0.95/ron_Latn.txt
- config_name: roo_Latn
data_files:
- split: train
path: data-0.95/roo_Latn.txt
- config_name: rop_Latn
data_files:
- split: train
path: data-0.95/rop_Latn.txt
- config_name: row_Latn
data_files:
- split: train
path: data-0.95/row_Latn.txt
- config_name: rro_Latn
data_files:
- split: train
path: data-0.95/rro_Latn.txt
- config_name: rtm_Latn
data_files:
- split: train
path: data-0.95/rtm_Latn.txt
- config_name: rub_Latn
data_files:
- split: train
path: data-0.95/rub_Latn.txt
- config_name: rue_Cyrl
data_files:
- split: train
path: data-0.95/rue_Cyrl.txt
- config_name: ruf_Latn
data_files:
- split: train
path: data-0.95/ruf_Latn.txt
- config_name: rug_Latn
data_files:
- split: train
path: data-0.95/rug_Latn.txt
- config_name: run_Latn
data_files:
- split: train
path: data-0.95/run_Latn.txt
- config_name: rup_Latn
data_files:
- split: train
path: data-0.95/rup_Latn.txt
- config_name: rus_Cyrl
data_files:
- split: train
path: data-0.95/rus_Cyrl.txt
- config_name: rwo_Latn
data_files:
- split: train
path: data-0.95/rwo_Latn.txt
- config_name: sab_Latn
data_files:
- split: train
path: data-0.95/sab_Latn.txt
- config_name: sag_Latn
data_files:
- split: train
path: data-0.95/sag_Latn.txt
- config_name: sah_Cyrl
data_files:
- split: train
path: data-0.95/sah_Cyrl.txt
- config_name: saj_Latn
data_files:
- split: train
path: data-0.95/saj_Latn.txt
- config_name: san_Deva
data_files:
- split: train
path: data-0.95/san_Deva.txt
- config_name: san_Latn
data_files:
- split: train
path: data-0.95/san_Latn.txt
- config_name: sas_Latn
data_files:
- split: train
path: data-0.95/sas_Latn.txt
- config_name: sat_Latn
data_files:
- split: train
path: data-0.95/sat_Latn.txt
- config_name: sat_Olck
data_files:
- split: train
path: data-0.95/sat_Olck.txt
- config_name: say_Latn
data_files:
- split: train
path: data-0.95/say_Latn.txt
- config_name: sba_Latn
data_files:
- split: train
path: data-0.95/sba_Latn.txt
- config_name: sbd_Latn
data_files:
- split: train
path: data-0.95/sbd_Latn.txt
- config_name: sbe_Latn
data_files:
- split: train
path: data-0.95/sbe_Latn.txt
- config_name: sbl_Latn
data_files:
- split: train
path: data-0.95/sbl_Latn.txt
- config_name: sbs_Latn
data_files:
- split: train
path: data-0.95/sbs_Latn.txt
- config_name: sby_Latn
data_files:
- split: train
path: data-0.95/sby_Latn.txt
- config_name: sck_Deva
data_files:
- split: train
path: data-0.95/sck_Deva.txt
- config_name: scn_Latn
data_files:
- split: train
path: data-0.95/scn_Latn.txt
- config_name: sco_Latn
data_files:
- split: train
path: data-0.95/sco_Latn.txt
- config_name: sda_Latn
data_files:
- split: train
path: data-0.95/sda_Latn.txt
- config_name: sdc_Latn
data_files:
- split: train
path: data-0.95/sdc_Latn.txt
- config_name: sdh_Arab
data_files:
- split: train
path: data-0.95/sdh_Arab.txt
- config_name: sdo_Latn
data_files:
- split: train
path: data-0.95/sdo_Latn.txt
- config_name: sdq_Latn
data_files:
- split: train
path: data-0.95/sdq_Latn.txt
- config_name: seh_Latn
data_files:
- split: train
path: data-0.95/seh_Latn.txt
- config_name: sel_Cyrl
data_files:
- split: train
path: data-0.95/sel_Cyrl.txt
- config_name: ses_Latn
data_files:
- split: train
path: data-0.95/ses_Latn.txt
- config_name: sey_Latn
data_files:
- split: train
path: data-0.95/sey_Latn.txt
- config_name: sfw_Latn
data_files:
- split: train
path: data-0.95/sfw_Latn.txt
- config_name: sgb_Latn
data_files:
- split: train
path: data-0.95/sgb_Latn.txt
- config_name: sgc_Latn
data_files:
- split: train
path: data-0.95/sgc_Latn.txt
- config_name: sgh_Cyrl
data_files:
- split: train
path: data-0.95/sgh_Cyrl.txt
- config_name: sgs_Latn
data_files:
- split: train
path: data-0.95/sgs_Latn.txt
- config_name: sgw_Ethi
data_files:
- split: train
path: data-0.95/sgw_Ethi.txt
- config_name: sgz_Latn
data_files:
- split: train
path: data-0.95/sgz_Latn.txt
- config_name: shi_Latn
data_files:
- split: train
path: data-0.95/shi_Latn.txt
- config_name: shk_Latn
data_files:
- split: train
path: data-0.95/shk_Latn.txt
- config_name: shn_Mymr
data_files:
- split: train
path: data-0.95/shn_Mymr.txt
- config_name: shp_Latn
data_files:
- split: train
path: data-0.95/shp_Latn.txt
- config_name: shr_Latn
data_files:
- split: train
path: data-0.95/shr_Latn.txt
- config_name: shu_Arab
data_files:
- split: train
path: data-0.95/shu_Arab.txt
- config_name: sid_Latn
data_files:
- split: train
path: data-0.95/sid_Latn.txt
- config_name: sig_Latn
data_files:
- split: train
path: data-0.95/sig_Latn.txt
- config_name: sil_Latn
data_files:
- split: train
path: data-0.95/sil_Latn.txt
- config_name: sim_Latn
data_files:
- split: train
path: data-0.95/sim_Latn.txt
- config_name: sin_Sinh
data_files:
- split: train
path: data-0.95/sin_Sinh.txt
- config_name: sja_Latn
data_files:
- split: train
path: data-0.95/sja_Latn.txt
- config_name: sjo_Mong
data_files:
- split: train
path: data-0.95/sjo_Mong.txt
- config_name: sju_Latn
data_files:
- split: train
path: data-0.95/sju_Latn.txt
- config_name: skg_Latn
data_files:
- split: train
path: data-0.95/skg_Latn.txt
- config_name: skr_Arab
data_files:
- split: train
path: data-0.95/skr_Arab.txt
- config_name: sld_Latn
data_files:
- split: train
path: data-0.95/sld_Latn.txt
- config_name: slk_Latn
data_files:
- split: train
path: data-0.95/slk_Latn.txt
- config_name: sll_Latn
data_files:
- split: train
path: data-0.95/sll_Latn.txt
- config_name: slv_Latn
data_files:
- split: train
path: data-0.95/slv_Latn.txt
- config_name: sma_Latn
data_files:
- split: train
path: data-0.95/sma_Latn.txt
- config_name: sme_Latn
data_files:
- split: train
path: data-0.95/sme_Latn.txt
- config_name: smj_Latn
data_files:
- split: train
path: data-0.95/smj_Latn.txt
- config_name: smk_Latn
data_files:
- split: train
path: data-0.95/smk_Latn.txt
- config_name: sml_Latn
data_files:
- split: train
path: data-0.95/sml_Latn.txt
- config_name: smn_Latn
data_files:
- split: train
path: data-0.95/smn_Latn.txt
- config_name: smo_Latn
data_files:
- split: train
path: data-0.95/smo_Latn.txt
- config_name: sms_Latn
data_files:
- split: train
path: data-0.95/sms_Latn.txt
- config_name: smt_Latn
data_files:
- split: train
path: data-0.95/smt_Latn.txt
- config_name: sna_Latn
data_files:
- split: train
path: data-0.95/sna_Latn.txt
- config_name: snc_Latn
data_files:
- split: train
path: data-0.95/snc_Latn.txt
- config_name: snd_Arab
data_files:
- split: train
path: data-0.95/snd_Arab.txt
- config_name: snd_Deva
data_files:
- split: train
path: data-0.95/snd_Deva.txt
- config_name: snd_Latn
data_files:
- split: train
path: data-0.95/snd_Latn.txt
- config_name: snf_Latn
data_files:
- split: train
path: data-0.95/snf_Latn.txt
- config_name: snn_Latn
data_files:
- split: train
path: data-0.95/snn_Latn.txt
- config_name: snp_Latn
data_files:
- split: train
path: data-0.95/snp_Latn.txt
- config_name: snw_Latn
data_files:
- split: train
path: data-0.95/snw_Latn.txt
- config_name: sny_Latn
data_files:
- split: train
path: data-0.95/sny_Latn.txt
- config_name: soe_Latn
data_files:
- split: train
path: data-0.95/soe_Latn.txt
- config_name: som_Latn
data_files:
- split: train
path: data-0.95/som_Latn.txt
- config_name: sop_Latn
data_files:
- split: train
path: data-0.95/sop_Latn.txt
- config_name: soq_Latn
data_files:
- split: train
path: data-0.95/soq_Latn.txt
- config_name: sot_Latn
data_files:
- split: train
path: data-0.95/sot_Latn.txt
- config_name: soy_Latn
data_files:
- split: train
path: data-0.95/soy_Latn.txt
- config_name: spa_Latn
data_files:
- split: train
path: data-0.95/spa_Latn.txt
- config_name: spl_Latn
data_files:
- split: train
path: data-0.95/spl_Latn.txt
- config_name: spm_Latn
data_files:
- split: train
path: data-0.95/spm_Latn.txt
- config_name: spp_Latn
data_files:
- split: train
path: data-0.95/spp_Latn.txt
- config_name: sps_Latn
data_files:
- split: train
path: data-0.95/sps_Latn.txt
- config_name: spy_Latn
data_files:
- split: train
path: data-0.95/spy_Latn.txt
- config_name: srd_Latn
data_files:
- split: train
path: data-0.95/srd_Latn.txt
- config_name: sri_Latn
data_files:
- split: train
path: data-0.95/sri_Latn.txt
- config_name: srm_Latn
data_files:
- split: train
path: data-0.95/srm_Latn.txt
- config_name: srn_Latn
data_files:
- split: train
path: data-0.95/srn_Latn.txt
- config_name: srp_Cyrl
data_files:
- split: train
path: data-0.95/srp_Cyrl.txt
- config_name: srp_Latn
data_files:
- split: train
path: data-0.95/srp_Latn.txt
- config_name: srq_Latn
data_files:
- split: train
path: data-0.95/srq_Latn.txt
- config_name: srr_Latn
data_files:
- split: train
path: data-0.95/srr_Latn.txt
- config_name: ssd_Latn
data_files:
- split: train
path: data-0.95/ssd_Latn.txt
- config_name: ssg_Latn
data_files:
- split: train
path: data-0.95/ssg_Latn.txt
- config_name: ssw_Latn
data_files:
- split: train
path: data-0.95/ssw_Latn.txt
- config_name: ssx_Latn
data_files:
- split: train
path: data-0.95/ssx_Latn.txt
- config_name: stn_Latn
data_files:
- split: train
path: data-0.95/stn_Latn.txt
- config_name: stp_Latn
data_files:
- split: train
path: data-0.95/stp_Latn.txt
- config_name: stq_Latn
data_files:
- split: train
path: data-0.95/stq_Latn.txt
- config_name: sua_Latn
data_files:
- split: train
path: data-0.95/sua_Latn.txt
- config_name: suc_Latn
data_files:
- split: train
path: data-0.95/suc_Latn.txt
- config_name: sue_Latn
data_files:
- split: train
path: data-0.95/sue_Latn.txt
- config_name: suk_Latn
data_files:
- split: train
path: data-0.95/suk_Latn.txt
- config_name: sun_Latn
data_files:
- split: train
path: data-0.95/sun_Latn.txt
- config_name: sur_Latn
data_files:
- split: train
path: data-0.95/sur_Latn.txt
- config_name: sus_Arab
data_files:
- split: train
path: data-0.95/sus_Arab.txt
- config_name: sus_Latn
data_files:
- split: train
path: data-0.95/sus_Latn.txt
- config_name: suz_Deva
data_files:
- split: train
path: data-0.95/suz_Deva.txt
- config_name: swb_Latn
data_files:
- split: train
path: data-0.95/swb_Latn.txt
- config_name: swc_Latn
data_files:
- split: train
path: data-0.95/swc_Latn.txt
- config_name: swe_Latn
data_files:
- split: train
path: data-0.95/swe_Latn.txt
- config_name: swg_Latn
data_files:
- split: train
path: data-0.95/swg_Latn.txt
- config_name: swh_Latn
data_files:
- split: train
path: data-0.95/swh_Latn.txt
- config_name: swk_Latn
data_files:
- split: train
path: data-0.95/swk_Latn.txt
- config_name: swp_Latn
data_files:
- split: train
path: data-0.95/swp_Latn.txt
- config_name: sxb_Latn
data_files:
- split: train
path: data-0.95/sxb_Latn.txt
- config_name: sxn_Latn
data_files:
- split: train
path: data-0.95/sxn_Latn.txt
- config_name: syb_Latn
data_files:
- split: train
path: data-0.95/syb_Latn.txt
- config_name: syc_Syrc
data_files:
- split: train
path: data-0.95/syc_Syrc.txt
- config_name: syl_Beng
data_files:
- split: train
path: data-0.95/syl_Beng.txt
- config_name: syl_Sylo
data_files:
- split: train
path: data-0.95/syl_Sylo.txt
- config_name: syl_Latn
data_files:
- split: train
path: data-0.95/syl_Latn.txt
- config_name: szb_Latn
data_files:
- split: train
path: data-0.95/szb_Latn.txt
- config_name: szl_Latn
data_files:
- split: train
path: data-0.95/szl_Latn.txt
- config_name: szy_Latn
data_files:
- split: train
path: data-0.95/szy_Latn.txt
- config_name: tab_Cyrl
data_files:
- split: train
path: data-0.95/tab_Cyrl.txt
- config_name: tac_Latn
data_files:
- split: train
path: data-0.95/tac_Latn.txt
- config_name: tah_Latn
data_files:
- split: train
path: data-0.95/tah_Latn.txt
- config_name: taj_Deva
data_files:
- split: train
path: data-0.95/taj_Deva.txt
- config_name: tam_Latn
data_files:
- split: train
path: data-0.95/tam_Latn.txt
- config_name: tam_Taml
data_files:
- split: train
path: data-0.95/tam_Taml.txt
- config_name: tap_Latn
data_files:
- split: train
path: data-0.95/tap_Latn.txt
- config_name: taq_Latn
data_files:
- split: train
path: data-0.95/taq_Latn.txt
- config_name: taq_Tfng
data_files:
- split: train
path: data-0.95/taq_Tfng.txt
- config_name: tar_Latn
data_files:
- split: train
path: data-0.95/tar_Latn.txt
- config_name: tat_Cyrl
data_files:
- split: train
path: data-0.95/tat_Cyrl.txt
- config_name: tat_Latn
data_files:
- split: train
path: data-0.95/tat_Latn.txt
- config_name: tav_Latn
data_files:
- split: train
path: data-0.95/tav_Latn.txt
- config_name: taw_Latn
data_files:
- split: train
path: data-0.95/taw_Latn.txt
- config_name: tay_Latn
data_files:
- split: train
path: data-0.95/tay_Latn.txt
- config_name: tbc_Latn
data_files:
- split: train
path: data-0.95/tbc_Latn.txt
- config_name: tbg_Latn
data_files:
- split: train
path: data-0.95/tbg_Latn.txt
- config_name: tbk_Latn
data_files:
- split: train
path: data-0.95/tbk_Latn.txt
- config_name: tbl_Latn
data_files:
- split: train
path: data-0.95/tbl_Latn.txt
- config_name: tbo_Latn
data_files:
- split: train
path: data-0.95/tbo_Latn.txt
- config_name: tbw_Latn
data_files:
- split: train
path: data-0.95/tbw_Latn.txt
- config_name: tby_Latn
data_files:
- split: train
path: data-0.95/tby_Latn.txt
- config_name: tbz_Latn
data_files:
- split: train
path: data-0.95/tbz_Latn.txt
- config_name: tca_Latn
data_files:
- split: train
path: data-0.95/tca_Latn.txt
- config_name: tcc_Latn
data_files:
- split: train
path: data-0.95/tcc_Latn.txt
- config_name: tcf_Latn
data_files:
- split: train
path: data-0.95/tcf_Latn.txt
- config_name: tcs_Latn
data_files:
- split: train
path: data-0.95/tcs_Latn.txt
- config_name: tcy_Knda
data_files:
- split: train
path: data-0.95/tcy_Knda.txt
- config_name: tcz_Latn
data_files:
- split: train
path: data-0.95/tcz_Latn.txt
- config_name: tdx_Latn
data_files:
- split: train
path: data-0.95/tdx_Latn.txt
- config_name: ted_Latn
data_files:
- split: train
path: data-0.95/ted_Latn.txt
- config_name: tee_Latn
data_files:
- split: train
path: data-0.95/tee_Latn.txt
- config_name: tel_Latn
data_files:
- split: train
path: data-0.95/tel_Latn.txt
- config_name: tel_Telu
data_files:
- split: train
path: data-0.95/tel_Telu.txt
- config_name: tem_Latn
data_files:
- split: train
path: data-0.95/tem_Latn.txt
- config_name: teo_Latn
data_files:
- split: train
path: data-0.95/teo_Latn.txt
- config_name: ter_Latn
data_files:
- split: train
path: data-0.95/ter_Latn.txt
- config_name: tet_Latn
data_files:
- split: train
path: data-0.95/tet_Latn.txt
- config_name: tew_Latn
data_files:
- split: train
path: data-0.95/tew_Latn.txt
- config_name: tfr_Latn
data_files:
- split: train
path: data-0.95/tfr_Latn.txt
- config_name: tgk_Cyrl
data_files:
- split: train
path: data-0.95/tgk_Cyrl.txt
- config_name: tgo_Latn
data_files:
- split: train
path: data-0.95/tgo_Latn.txt
- config_name: tgp_Latn
data_files:
- split: train
path: data-0.95/tgp_Latn.txt
- config_name: tha_Thai
data_files:
- split: train
path: data-0.95/tha_Thai.txt
- config_name: thk_Latn
data_files:
- split: train
path: data-0.95/thk_Latn.txt
- config_name: thl_Deva
data_files:
- split: train
path: data-0.95/thl_Deva.txt
- config_name: thv_Latn
data_files:
- split: train
path: data-0.95/thv_Latn.txt
- config_name: tif_Latn
data_files:
- split: train
path: data-0.95/tif_Latn.txt
- config_name: tig_Ethi
data_files:
- split: train
path: data-0.95/tig_Ethi.txt
- config_name: tih_Latn
data_files:
- split: train
path: data-0.95/tih_Latn.txt
- config_name: tik_Latn
data_files:
- split: train
path: data-0.95/tik_Latn.txt
- config_name: tim_Latn
data_files:
- split: train
path: data-0.95/tim_Latn.txt
- config_name: tir_Ethi
data_files:
- split: train
path: data-0.95/tir_Ethi.txt
- config_name: tiv_Latn
data_files:
- split: train
path: data-0.95/tiv_Latn.txt
- config_name: tiy_Latn
data_files:
- split: train
path: data-0.95/tiy_Latn.txt
- config_name: tke_Latn
data_files:
- split: train
path: data-0.95/tke_Latn.txt
- config_name: tkl_Latn
data_files:
- split: train
path: data-0.95/tkl_Latn.txt
- config_name: tkr_Cyrl
data_files:
- split: train
path: data-0.95/tkr_Cyrl.txt
- config_name: tku_Latn
data_files:
- split: train
path: data-0.95/tku_Latn.txt
- config_name: tlb_Latn
data_files:
- split: train
path: data-0.95/tlb_Latn.txt
- config_name: tlf_Latn
data_files:
- split: train
path: data-0.95/tlf_Latn.txt
- config_name: tlh_Latn
data_files:
- split: train
path: data-0.95/tlh_Latn.txt
- config_name: tlj_Latn
data_files:
- split: train
path: data-0.95/tlj_Latn.txt
- config_name: tll_Latn
data_files:
- split: train
path: data-0.95/tll_Latn.txt
- config_name: tly_Latn
data_files:
- split: train
path: data-0.95/tly_Latn.txt
- config_name: tmc_Latn
data_files:
- split: train
path: data-0.95/tmc_Latn.txt
- config_name: tmd_Latn
data_files:
- split: train
path: data-0.95/tmd_Latn.txt
- config_name: tna_Latn
data_files:
- split: train
path: data-0.95/tna_Latn.txt
- config_name: tnc_Latn
data_files:
- split: train
path: data-0.95/tnc_Latn.txt
- config_name: tnk_Latn
data_files:
- split: train
path: data-0.95/tnk_Latn.txt
- config_name: tnn_Latn
data_files:
- split: train
path: data-0.95/tnn_Latn.txt
- config_name: tnp_Latn
data_files:
- split: train
path: data-0.95/tnp_Latn.txt
- config_name: tnr_Latn
data_files:
- split: train
path: data-0.95/tnr_Latn.txt
- config_name: tob_Latn
data_files:
- split: train
path: data-0.95/tob_Latn.txt
- config_name: toc_Latn
data_files:
- split: train
path: data-0.95/toc_Latn.txt
- config_name: tod_Latn
data_files:
- split: train
path: data-0.95/tod_Latn.txt
- config_name: tog_Latn
data_files:
- split: train
path: data-0.95/tog_Latn.txt
- config_name: toh_Latn
data_files:
- split: train
path: data-0.95/toh_Latn.txt
- config_name: toi_Latn
data_files:
- split: train
path: data-0.95/toi_Latn.txt
- config_name: toj_Latn
data_files:
- split: train
path: data-0.95/toj_Latn.txt
- config_name: tok_Latn
data_files:
- split: train
path: data-0.95/tok_Latn.txt
- config_name: ton_Latn
data_files:
- split: train
path: data-0.95/ton_Latn.txt
- config_name: too_Latn
data_files:
- split: train
path: data-0.95/too_Latn.txt
- config_name: top_Latn
data_files:
- split: train
path: data-0.95/top_Latn.txt
- config_name: tos_Latn
data_files:
- split: train
path: data-0.95/tos_Latn.txt
- config_name: tpa_Latn
data_files:
- split: train
path: data-0.95/tpa_Latn.txt
- config_name: tpi_Latn
data_files:
- split: train
path: data-0.95/tpi_Latn.txt
- config_name: tpm_Latn
data_files:
- split: train
path: data-0.95/tpm_Latn.txt
- config_name: tpn_Latn
data_files:
- split: train
path: data-0.95/tpn_Latn.txt
- config_name: tpp_Latn
data_files:
- split: train
path: data-0.95/tpp_Latn.txt
- config_name: tpt_Latn
data_files:
- split: train
path: data-0.95/tpt_Latn.txt
- config_name: tpw_Latn
data_files:
- split: train
path: data-0.95/tpw_Latn.txt
- config_name: tpz_Latn
data_files:
- split: train
path: data-0.95/tpz_Latn.txt
- config_name: tqo_Latn
data_files:
- split: train
path: data-0.95/tqo_Latn.txt
- config_name: trc_Latn
data_files:
- split: train
path: data-0.95/trc_Latn.txt
- config_name: trn_Latn
data_files:
- split: train
path: data-0.95/trn_Latn.txt
- config_name: tro_Latn
data_files:
- split: train
path: data-0.95/tro_Latn.txt
- config_name: trp_Latn
data_files:
- split: train
path: data-0.95/trp_Latn.txt
- config_name: trq_Latn
data_files:
- split: train
path: data-0.95/trq_Latn.txt
- config_name: trs_Latn
data_files:
- split: train
path: data-0.95/trs_Latn.txt
- config_name: trv_Latn
data_files:
- split: train
path: data-0.95/trv_Latn.txt
- config_name: tsc_Latn
data_files:
- split: train
path: data-0.95/tsc_Latn.txt
- config_name: tsg_Latn
data_files:
- split: train
path: data-0.95/tsg_Latn.txt
- config_name: tsn_Latn
data_files:
- split: train
path: data-0.95/tsn_Latn.txt
- config_name: tso_Latn
data_files:
- split: train
path: data-0.95/tso_Latn.txt
- config_name: tsw_Latn
data_files:
- split: train
path: data-0.95/tsw_Latn.txt
- config_name: tsz_Latn
data_files:
- split: train
path: data-0.95/tsz_Latn.txt
- config_name: ttc_Latn
data_files:
- split: train
path: data-0.95/ttc_Latn.txt
- config_name: tte_Latn
data_files:
- split: train
path: data-0.95/tte_Latn.txt
- config_name: ttj_Latn
data_files:
- split: train
path: data-0.95/ttj_Latn.txt
- config_name: ttq_Latn
data_files:
- split: train
path: data-0.95/ttq_Latn.txt
- config_name: ttq_Tfng
data_files:
- split: train
path: data-0.95/ttq_Tfng.txt
- config_name: tuc_Latn
data_files:
- split: train
path: data-0.95/tuc_Latn.txt
- config_name: tue_Latn
data_files:
- split: train
path: data-0.95/tue_Latn.txt
- config_name: tuf_Latn
data_files:
- split: train
path: data-0.95/tuf_Latn.txt
- config_name: tui_Latn
data_files:
- split: train
path: data-0.95/tui_Latn.txt
- config_name: tuk_Arab
data_files:
- split: train
path: data-0.95/tuk_Arab.txt
- config_name: tuk_Cyrl
data_files:
- split: train
path: data-0.95/tuk_Cyrl.txt
- config_name: tuk_Latn
data_files:
- split: train
path: data-0.95/tuk_Latn.txt
- config_name: tul_Latn
data_files:
- split: train
path: data-0.95/tul_Latn.txt
- config_name: tum_Latn
data_files:
- split: train
path: data-0.95/tum_Latn.txt
- config_name: tuo_Latn
data_files:
- split: train
path: data-0.95/tuo_Latn.txt
- config_name: tur_Latn
data_files:
- split: train
path: data-0.95/tur_Latn.txt
- config_name: tuv_Latn
data_files:
- split: train
path: data-0.95/tuv_Latn.txt
- config_name: tvk_Latn
data_files:
- split: train
path: data-0.95/tvk_Latn.txt
- config_name: tvl_Latn
data_files:
- split: train
path: data-0.95/tvl_Latn.txt
- config_name: twb_Latn
data_files:
- split: train
path: data-0.95/twb_Latn.txt
- config_name: twi_Latn
data_files:
- split: train
path: data-0.95/twi_Latn.txt
- config_name: twu_Latn
data_files:
- split: train
path: data-0.95/twu_Latn.txt
- config_name: twx_Latn
data_files:
- split: train
path: data-0.95/twx_Latn.txt
- config_name: txq_Latn
data_files:
- split: train
path: data-0.95/txq_Latn.txt
- config_name: txu_Latn
data_files:
- split: train
path: data-0.95/txu_Latn.txt
- config_name: tyv_Cyrl
data_files:
- split: train
path: data-0.95/tyv_Cyrl.txt
- config_name: tzh_Latn
data_files:
- split: train
path: data-0.95/tzh_Latn.txt
- config_name: tzj_Latn
data_files:
- split: train
path: data-0.95/tzj_Latn.txt
- config_name: tzl_Latn
data_files:
- split: train
path: data-0.95/tzl_Latn.txt
- config_name: tzm_Tfng
data_files:
- split: train
path: data-0.95/tzm_Tfng.txt
- config_name: tzo_Latn
data_files:
- split: train
path: data-0.95/tzo_Latn.txt
- config_name: ubr_Latn
data_files:
- split: train
path: data-0.95/ubr_Latn.txt
- config_name: ubu_Latn
data_files:
- split: train
path: data-0.95/ubu_Latn.txt
- config_name: udm_Cyrl
data_files:
- split: train
path: data-0.95/udm_Cyrl.txt
- config_name: udu_Latn
data_files:
- split: train
path: data-0.95/udu_Latn.txt
- config_name: uig_Arab
data_files:
- split: train
path: data-0.95/uig_Arab.txt
- config_name: uig_Cyrl
data_files:
- split: train
path: data-0.95/uig_Cyrl.txt
- config_name: uig_Latn
data_files:
- split: train
path: data-0.95/uig_Latn.txt
- config_name: ukr_Cyrl
data_files:
- split: train
path: data-0.95/ukr_Cyrl.txt
- config_name: umb_Latn
data_files:
- split: train
path: data-0.95/umb_Latn.txt
- config_name: upv_Latn
data_files:
- split: train
path: data-0.95/upv_Latn.txt
- config_name: ura_Latn
data_files:
- split: train
path: data-0.95/ura_Latn.txt
- config_name: urb_Latn
data_files:
- split: train
path: data-0.95/urb_Latn.txt
- config_name: urd_Arab
data_files:
- split: train
path: data-0.95/urd_Arab.txt
- config_name: urd_Latn
data_files:
- split: train
path: data-0.95/urd_Latn.txt
- config_name: urh_Latn
data_files:
- split: train
path: data-0.95/urh_Latn.txt
- config_name: uri_Latn
data_files:
- split: train
path: data-0.95/uri_Latn.txt
- config_name: urk_Thai
data_files:
- split: train
path: data-0.95/urk_Thai.txt
- config_name: urt_Latn
data_files:
- split: train
path: data-0.95/urt_Latn.txt
- config_name: urw_Latn
data_files:
- split: train
path: data-0.95/urw_Latn.txt
- config_name: ury_Latn
data_files:
- split: train
path: data-0.95/ury_Latn.txt
- config_name: usa_Latn
data_files:
- split: train
path: data-0.95/usa_Latn.txt
- config_name: usp_Latn
data_files:
- split: train
path: data-0.95/usp_Latn.txt
- config_name: uth_Latn
data_files:
- split: train
path: data-0.95/uth_Latn.txt
- config_name: uvh_Latn
data_files:
- split: train
path: data-0.95/uvh_Latn.txt
- config_name: uvl_Latn
data_files:
- split: train
path: data-0.95/uvl_Latn.txt
- config_name: uzn_Cyrl
data_files:
- split: train
path: data-0.95/uzn_Cyrl.txt
- config_name: uzn_Latn
data_files:
- split: train
path: data-0.95/uzn_Latn.txt
- config_name: uzs_Arab
data_files:
- split: train
path: data-0.95/uzs_Arab.txt
- config_name: vag_Latn
data_files:
- split: train
path: data-0.95/vag_Latn.txt
- config_name: vap_Latn
data_files:
- split: train
path: data-0.95/vap_Latn.txt
- config_name: var_Latn
data_files:
- split: train
path: data-0.95/var_Latn.txt
- config_name: vec_Latn
data_files:
- split: train
path: data-0.95/vec_Latn.txt
- config_name: ven_Latn
data_files:
- split: train
path: data-0.95/ven_Latn.txt
- config_name: vep_Latn
data_files:
- split: train
path: data-0.95/vep_Latn.txt
- config_name: vid_Latn
data_files:
- split: train
path: data-0.95/vid_Latn.txt
- config_name: vie_Latn
data_files:
- split: train
path: data-0.95/vie_Latn.txt
- config_name: viv_Latn
data_files:
- split: train
path: data-0.95/viv_Latn.txt
- config_name: vls_Latn
data_files:
- split: train
path: data-0.95/vls_Latn.txt
- config_name: vmk_Latn
data_files:
- split: train
path: data-0.95/vmk_Latn.txt
- config_name: vmw_Latn
data_files:
- split: train
path: data-0.95/vmw_Latn.txt
- config_name: vmy_Latn
data_files:
- split: train
path: data-0.95/vmy_Latn.txt
- config_name: vol_Latn
data_files:
- split: train
path: data-0.95/vol_Latn.txt
- config_name: vot_Latn
data_files:
- split: train
path: data-0.95/vot_Latn.txt
- config_name: vro_Latn
data_files:
- split: train
path: data-0.95/vro_Latn.txt
- config_name: vun_Latn
data_files:
- split: train
path: data-0.95/vun_Latn.txt
- config_name: vut_Latn
data_files:
- split: train
path: data-0.95/vut_Latn.txt
- config_name: waj_Latn
data_files:
- split: train
path: data-0.95/waj_Latn.txt
- config_name: wal_Ethi
data_files:
- split: train
path: data-0.95/wal_Ethi.txt
- config_name: wal_Latn
data_files:
- split: train
path: data-0.95/wal_Latn.txt
- config_name: wap_Latn
data_files:
- split: train
path: data-0.95/wap_Latn.txt
- config_name: war_Latn
data_files:
- split: train
path: data-0.95/war_Latn.txt
- config_name: wat_Latn
data_files:
- split: train
path: data-0.95/wat_Latn.txt
- config_name: way_Latn
data_files:
- split: train
path: data-0.95/way_Latn.txt
- config_name: wba_Latn
data_files:
- split: train
path: data-0.95/wba_Latn.txt
- config_name: wbm_Latn
data_files:
- split: train
path: data-0.95/wbm_Latn.txt
- config_name: wbp_Latn
data_files:
- split: train
path: data-0.95/wbp_Latn.txt
- config_name: wed_Latn
data_files:
- split: train
path: data-0.95/wed_Latn.txt
- config_name: wer_Latn
data_files:
- split: train
path: data-0.95/wer_Latn.txt
- config_name: wes_Latn
data_files:
- split: train
path: data-0.95/wes_Latn.txt
- config_name: wew_Latn
data_files:
- split: train
path: data-0.95/wew_Latn.txt
- config_name: whg_Latn
data_files:
- split: train
path: data-0.95/whg_Latn.txt
- config_name: whk_Latn
data_files:
- split: train
path: data-0.95/whk_Latn.txt
- config_name: wib_Latn
data_files:
- split: train
path: data-0.95/wib_Latn.txt
- config_name: wim_Latn
data_files:
- split: train
path: data-0.95/wim_Latn.txt
- config_name: wiu_Latn
data_files:
- split: train
path: data-0.95/wiu_Latn.txt
- config_name: wln_Latn
data_files:
- split: train
path: data-0.95/wln_Latn.txt
- config_name: wls_Latn
data_files:
- split: train
path: data-0.95/wls_Latn.txt
- config_name: wlv_Latn
data_files:
- split: train
path: data-0.95/wlv_Latn.txt
- config_name: wlx_Latn
data_files:
- split: train
path: data-0.95/wlx_Latn.txt
- config_name: wmt_Latn
data_files:
- split: train
path: data-0.95/wmt_Latn.txt
- config_name: wmw_Latn
data_files:
- split: train
path: data-0.95/wmw_Latn.txt
- config_name: wnc_Latn
data_files:
- split: train
path: data-0.95/wnc_Latn.txt
- config_name: wnu_Latn
data_files:
- split: train
path: data-0.95/wnu_Latn.txt
- config_name: wob_Latn
data_files:
- split: train
path: data-0.95/wob_Latn.txt
- config_name: wol_Latn
data_files:
- split: train
path: data-0.95/wol_Latn.txt
- config_name: wos_Latn
data_files:
- split: train
path: data-0.95/wos_Latn.txt
- config_name: wrk_Latn
data_files:
- split: train
path: data-0.95/wrk_Latn.txt
- config_name: wrs_Latn
data_files:
- split: train
path: data-0.95/wrs_Latn.txt
- config_name: wsg_Telu
data_files:
- split: train
path: data-0.95/wsg_Telu.txt
- config_name: wsk_Latn
data_files:
- split: train
path: data-0.95/wsk_Latn.txt
- config_name: wuu_Hani
data_files:
- split: train
path: data-0.95/wuu_Hani.txt
- config_name: wuv_Latn
data_files:
- split: train
path: data-0.95/wuv_Latn.txt
- config_name: wwa_Latn
data_files:
- split: train
path: data-0.95/wwa_Latn.txt
- config_name: xal_Cyrl
data_files:
- split: train
path: data-0.95/xal_Cyrl.txt
- config_name: xav_Latn
data_files:
- split: train
path: data-0.95/xav_Latn.txt
- config_name: xbi_Latn
data_files:
- split: train
path: data-0.95/xbi_Latn.txt
- config_name: xbr_Latn
data_files:
- split: train
path: data-0.95/xbr_Latn.txt
- config_name: xed_Latn
data_files:
- split: train
path: data-0.95/xed_Latn.txt
- config_name: xho_Latn
data_files:
- split: train
path: data-0.95/xho_Latn.txt
- config_name: xla_Latn
data_files:
- split: train
path: data-0.95/xla_Latn.txt
- config_name: xmf_Geor
data_files:
- split: train
path: data-0.95/xmf_Geor.txt
- config_name: xmm_Latn
data_files:
- split: train
path: data-0.95/xmm_Latn.txt
- config_name: xmv_Latn
data_files:
- split: train
path: data-0.95/xmv_Latn.txt
- config_name: xnn_Latn
data_files:
- split: train
path: data-0.95/xnn_Latn.txt
- config_name: xog_Latn
data_files:
- split: train
path: data-0.95/xog_Latn.txt
- config_name: xon_Latn
data_files:
- split: train
path: data-0.95/xon_Latn.txt
- config_name: xrb_Latn
data_files:
- split: train
path: data-0.95/xrb_Latn.txt
- config_name: xsb_Latn
data_files:
- split: train
path: data-0.95/xsb_Latn.txt
- config_name: xsi_Latn
data_files:
- split: train
path: data-0.95/xsi_Latn.txt
- config_name: xsm_Latn
data_files:
- split: train
path: data-0.95/xsm_Latn.txt
- config_name: xsr_Deva
data_files:
- split: train
path: data-0.95/xsr_Deva.txt
- config_name: xsu_Latn
data_files:
- split: train
path: data-0.95/xsu_Latn.txt
- config_name: xtd_Latn
data_files:
- split: train
path: data-0.95/xtd_Latn.txt
- config_name: xtm_Latn
data_files:
- split: train
path: data-0.95/xtm_Latn.txt
- config_name: xtn_Latn
data_files:
- split: train
path: data-0.95/xtn_Latn.txt
- config_name: xum_Latn
data_files:
- split: train
path: data-0.95/xum_Latn.txt
- config_name: xuo_Latn
data_files:
- split: train
path: data-0.95/xuo_Latn.txt
- config_name: yaa_Latn
data_files:
- split: train
path: data-0.95/yaa_Latn.txt
- config_name: yad_Latn
data_files:
- split: train
path: data-0.95/yad_Latn.txt
- config_name: yal_Latn
data_files:
- split: train
path: data-0.95/yal_Latn.txt
- config_name: yam_Latn
data_files:
- split: train
path: data-0.95/yam_Latn.txt
- config_name: yan_Latn
data_files:
- split: train
path: data-0.95/yan_Latn.txt
- config_name: yao_Latn
data_files:
- split: train
path: data-0.95/yao_Latn.txt
- config_name: yap_Latn
data_files:
- split: train
path: data-0.95/yap_Latn.txt
- config_name: yaq_Latn
data_files:
- split: train
path: data-0.95/yaq_Latn.txt
- config_name: yas_Latn
data_files:
- split: train
path: data-0.95/yas_Latn.txt
- config_name: yat_Latn
data_files:
- split: train
path: data-0.95/yat_Latn.txt
- config_name: yaz_Latn
data_files:
- split: train
path: data-0.95/yaz_Latn.txt
- config_name: ybb_Latn
data_files:
- split: train
path: data-0.95/ybb_Latn.txt
- config_name: yby_Latn
data_files:
- split: train
path: data-0.95/yby_Latn.txt
- config_name: ycn_Latn
data_files:
- split: train
path: data-0.95/ycn_Latn.txt
- config_name: ydd_Hebr
data_files:
- split: train
path: data-0.95/ydd_Hebr.txt
- config_name: yim_Latn
data_files:
- split: train
path: data-0.95/yim_Latn.txt
- config_name: yka_Latn
data_files:
- split: train
path: data-0.95/yka_Latn.txt
- config_name: yle_Latn
data_files:
- split: train
path: data-0.95/yle_Latn.txt
- config_name: yli_Latn
data_files:
- split: train
path: data-0.95/yli_Latn.txt
- config_name: yml_Latn
data_files:
- split: train
path: data-0.95/yml_Latn.txt
- config_name: yom_Latn
data_files:
- split: train
path: data-0.95/yom_Latn.txt
- config_name: yon_Latn
data_files:
- split: train
path: data-0.95/yon_Latn.txt
- config_name: yor_Latn
data_files:
- split: train
path: data-0.95/yor_Latn.txt
- config_name: yrb_Latn
data_files:
- split: train
path: data-0.95/yrb_Latn.txt
- config_name: yre_Latn
data_files:
- split: train
path: data-0.95/yre_Latn.txt
- config_name: yrk_Cyrl
data_files:
- split: train
path: data-0.95/yrk_Cyrl.txt
- config_name: yrl_Latn
data_files:
- split: train
path: data-0.95/yrl_Latn.txt
- config_name: yss_Latn
data_files:
- split: train
path: data-0.95/yss_Latn.txt
- config_name: yua_Latn
data_files:
- split: train
path: data-0.95/yua_Latn.txt
- config_name: yue_Hani
data_files:
- split: train
path: data-0.95/yue_Hani.txt
- config_name: yuj_Latn
data_files:
- split: train
path: data-0.95/yuj_Latn.txt
- config_name: yup_Latn
data_files:
- split: train
path: data-0.95/yup_Latn.txt
- config_name: yut_Latn
data_files:
- split: train
path: data-0.95/yut_Latn.txt
- config_name: yuw_Latn
data_files:
- split: train
path: data-0.95/yuw_Latn.txt
- config_name: yuz_Latn
data_files:
- split: train
path: data-0.95/yuz_Latn.txt
- config_name: yva_Latn
data_files:
- split: train
path: data-0.95/yva_Latn.txt
- config_name: zaa_Latn
data_files:
- split: train
path: data-0.95/zaa_Latn.txt
- config_name: zab_Latn
data_files:
- split: train
path: data-0.95/zab_Latn.txt
- config_name: zac_Latn
data_files:
- split: train
path: data-0.95/zac_Latn.txt
- config_name: zad_Latn
data_files:
- split: train
path: data-0.95/zad_Latn.txt
- config_name: zae_Latn
data_files:
- split: train
path: data-0.95/zae_Latn.txt
- config_name: zai_Latn
data_files:
- split: train
path: data-0.95/zai_Latn.txt
- config_name: zam_Latn
data_files:
- split: train
path: data-0.95/zam_Latn.txt
- config_name: zao_Latn
data_files:
- split: train
path: data-0.95/zao_Latn.txt
- config_name: zar_Latn
data_files:
- split: train
path: data-0.95/zar_Latn.txt
- config_name: zas_Latn
data_files:
- split: train
path: data-0.95/zas_Latn.txt
- config_name: zat_Latn
data_files:
- split: train
path: data-0.95/zat_Latn.txt
- config_name: zav_Latn
data_files:
- split: train
path: data-0.95/zav_Latn.txt
- config_name: zaw_Latn
data_files:
- split: train
path: data-0.95/zaw_Latn.txt
- config_name: zca_Latn
data_files:
- split: train
path: data-0.95/zca_Latn.txt
- config_name: zdj_Latn
data_files:
- split: train
path: data-0.95/zdj_Latn.txt
- config_name: zea_Latn
data_files:
- split: train
path: data-0.95/zea_Latn.txt
- config_name: zgh_Tfng
data_files:
- split: train
path: data-0.95/zgh_Tfng.txt
- config_name: zia_Latn
data_files:
- split: train
path: data-0.95/zia_Latn.txt
- config_name: ziw_Latn
data_files:
- split: train
path: data-0.95/ziw_Latn.txt
- config_name: zne_Latn
data_files:
- split: train
path: data-0.95/zne_Latn.txt
- config_name: zoc_Latn
data_files:
- split: train
path: data-0.95/zoc_Latn.txt
- config_name: zom_Latn
data_files:
- split: train
path: data-0.95/zom_Latn.txt
- config_name: zos_Latn
data_files:
- split: train
path: data-0.95/zos_Latn.txt
- config_name: zpa_Latn
data_files:
- split: train
path: data-0.95/zpa_Latn.txt
- config_name: zpc_Latn
data_files:
- split: train
path: data-0.95/zpc_Latn.txt
- config_name: zpg_Latn
data_files:
- split: train
path: data-0.95/zpg_Latn.txt
- config_name: zpi_Latn
data_files:
- split: train
path: data-0.95/zpi_Latn.txt
- config_name: zpj_Latn
data_files:
- split: train
path: data-0.95/zpj_Latn.txt
- config_name: zpl_Latn
data_files:
- split: train
path: data-0.95/zpl_Latn.txt
- config_name: zpm_Latn
data_files:
- split: train
path: data-0.95/zpm_Latn.txt
- config_name: zpo_Latn
data_files:
- split: train
path: data-0.95/zpo_Latn.txt
- config_name: zpq_Latn
data_files:
- split: train
path: data-0.95/zpq_Latn.txt
- config_name: zpt_Latn
data_files:
- split: train
path: data-0.95/zpt_Latn.txt
- config_name: zpu_Latn
data_files:
- split: train
path: data-0.95/zpu_Latn.txt
- config_name: zpv_Latn
data_files:
- split: train
path: data-0.95/zpv_Latn.txt
- config_name: zpz_Latn
data_files:
- split: train
path: data-0.95/zpz_Latn.txt
- config_name: zsm_Arab
data_files:
- split: train
path: data-0.95/zsm_Arab.txt
- config_name: zsm_Latn
data_files:
- split: train
path: data-0.95/zsm_Latn.txt
- config_name: zsr_Latn
data_files:
- split: train
path: data-0.95/zsr_Latn.txt
- config_name: ztq_Latn
data_files:
- split: train
path: data-0.95/ztq_Latn.txt
- config_name: zty_Latn
data_files:
- split: train
path: data-0.95/zty_Latn.txt
- config_name: zul_Latn
data_files:
- split: train
path: data-0.95/zul_Latn.txt
- config_name: zyb_Latn
data_files:
- split: train
path: data-0.95/zyb_Latn.txt
- config_name: zyp_Latn
data_files:
- split: train
path: data-0.95/zyp_Latn.txt
---
## GlotLID Wordlists
This is a set of wordlists extracted from the [GlotLID-corpus](https://huggingface.co/datasets/cis-lmu/glotlid-corpus) for high-precision filtering of [FineWeb2](https://huggingface.co/datasets/HuggingFaceFW/fineweb-2).
### Download
The recommended way to download the data is via `git clone`:
```sh
git clone https://huggingface.co/datasets/cis-lmu/glotlid-wordlists
```
### Method and Usage for Precision Filtering
For details on the filtering method, please refer to the [FineWeb2 paper](#). <!-- Add actual URL when available -->
Each word listed occurs significantly more often in its own language dataset than in any other language.
The higher a word ranks in each language's filter list, the more frequently it appears in that language.
You can also check out the [`filter.py`](https://huggingface.co/datasets/cis-lmu/glotlid-wordlists/blob/main/filter.py) script for an example of how to use wordlists in practice.
### Citation
If you find the wordlists useful, please cite both the GlotLID and FineWeb2 papers:
**GlotLID paper** ([link](https://arxiv.org/abs/2310.16248)):
```bibtex
@inproceedings{
kargaran2023glotlid,
title={{GlotLID}: Language Identification for Low-Resource Languages},
author={Kargaran, Amir Hossein and Imani, Ayyoob and Yvon, Fran{\c{c}}ois and Sch{\"u}tze, Hinrich},
booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
year={2023},
url={https://openreview.net/forum?id=dl4e3EBz5j}
}
```
**FineWeb2 paper** ([link](#)):
```bibtex
Add bibtext when available
```
提供机构:
kargaranamir



