five

jaygala24/xsimplusplus

收藏
Hugging Face2024-05-01 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/jaygala24/xsimplusplus
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - derived language: - ace - ace - acm - acq - aeb - afr - ajp - aka - als - amh - apc - arb - arb - ars - ary - arz - asm - ast - awa - ayr - azb - azj - bak - bam - ban - bel - bem - ben - bho - bjn - bjn - bod - bos - bug - bul - cat - ceb - ces - cjk - ckb - crh - cym - dan - deu - dik - dyu - dzo - ell - eng - epo - est - eus - ewe - fao - fij - fin - fon - fra - fur - fuv - gaz - gla - gle - glg - grn - guj - hat - hau - heb - hin - hne - hrv - hun - hye - ibo - ilo - ind - isl - ita - jav - jpn - kab - kac - kam - kan - kas - kas - kat - kaz - kbp - kea - khk - khm - kik - kin - kir - kmb - kmr - knc - knc - kon - kor - lao - lij - lim - lin - lit - lmo - ltg - ltz - lua - lug - luo - lus - lvs - mag - mai - mal - mar - min - min - mkd - mlt - mni - mos - mri - mya - nld - nno - nob - npi - nso - nus - nya - oci - ory - pag - pan - pap - pbt - pes - plt - pol - por - prs - quy - ron - run - rus - sag - san - sat - scn - shn - sin - slk - slv - smo - sna - snd - som - sot - spa - srd - srp - ssw - sun - swe - swh - szl - tam - taq - taq - tat - tel - tgk - tgl - tha - tir - tpi - tsn - tso - tuk - tum - tur - twi - tzm - uig - ukr - umb - urd - uzn - vec - vie - war - wol - xho - ydd - yor - yue - zho - zho - zsm - zul license: - cc0-1.0 - other multilinguality: - multilingual pretty_name: xsimplusplus size_categories: - 1K<n<400K configs: - config_name: default data_files: - split: dev path: data/default/dev.parquet - split: devtest path: data/default/devtest.parquet - config_name: ace_Arab data_files: - split: dev path: data/eng_Latn-ace_Arab/dev.parquet - split: devtest path: data/eng_Latn-ace_Arab/devtest.parquet - config_name: ace_Latn data_files: - split: dev path: data/eng_Latn-ace_Latn/dev.parquet - split: devtest path: data/eng_Latn-ace_Latn/devtest.parquet - config_name: acm_Arab data_files: - split: dev path: data/eng_Latn-acm_Arab/dev.parquet - split: devtest path: data/eng_Latn-acm_Arab/devtest.parquet - config_name: acq_Arab data_files: - split: dev path: data/eng_Latn-acq_Arab/dev.parquet - split: devtest path: data/eng_Latn-acq_Arab/devtest.parquet - config_name: aeb_Arab data_files: - split: dev path: data/eng_Latn-aeb_Arab/dev.parquet - split: devtest path: data/eng_Latn-aeb_Arab/devtest.parquet - config_name: afr_Latn data_files: - split: dev path: data/eng_Latn-afr_Latn/dev.parquet - split: devtest path: data/eng_Latn-afr_Latn/devtest.parquet - config_name: ajp_Arab data_files: - split: dev path: data/eng_Latn-ajp_Arab/dev.parquet - split: devtest path: data/eng_Latn-ajp_Arab/devtest.parquet - config_name: aka_Latn data_files: - split: dev path: data/eng_Latn-aka_Latn/dev.parquet - split: devtest path: data/eng_Latn-aka_Latn/devtest.parquet - config_name: als_Latn data_files: - split: dev path: data/eng_Latn-als_Latn/dev.parquet - split: devtest path: data/eng_Latn-als_Latn/devtest.parquet - config_name: amh_Ethi data_files: - split: dev path: data/eng_Latn-amh_Ethi/dev.parquet - split: devtest path: data/eng_Latn-amh_Ethi/devtest.parquet - config_name: apc_Arab data_files: - split: dev path: data/eng_Latn-apc_Arab/dev.parquet - split: devtest path: data/eng_Latn-apc_Arab/devtest.parquet - config_name: arb_Arab data_files: - split: dev path: data/eng_Latn-arb_Arab/dev.parquet - split: devtest path: data/eng_Latn-arb_Arab/devtest.parquet - config_name: arb_Latn data_files: - split: dev path: data/eng_Latn-arb_Latn/dev.parquet - split: devtest path: data/eng_Latn-arb_Latn/devtest.parquet - config_name: ars_Arab data_files: - split: dev path: data/eng_Latn-ars_Arab/dev.parquet - split: devtest path: data/eng_Latn-ars_Arab/devtest.parquet - config_name: ary_Arab data_files: - split: dev path: data/eng_Latn-ary_Arab/dev.parquet - split: devtest path: data/eng_Latn-ary_Arab/devtest.parquet - config_name: arz_Arab data_files: - split: dev path: data/eng_Latn-arz_Arab/dev.parquet - split: devtest path: data/eng_Latn-arz_Arab/devtest.parquet - config_name: asm_Beng data_files: - split: dev path: data/eng_Latn-asm_Beng/dev.parquet - split: devtest path: data/eng_Latn-asm_Beng/devtest.parquet - config_name: ast_Latn data_files: - split: dev path: data/eng_Latn-ast_Latn/dev.parquet - split: devtest path: data/eng_Latn-ast_Latn/devtest.parquet - config_name: awa_Deva data_files: - split: dev path: data/eng_Latn-awa_Deva/dev.parquet - split: devtest path: data/eng_Latn-awa_Deva/devtest.parquet - config_name: ayr_Latn data_files: - split: dev path: data/eng_Latn-ayr_Latn/dev.parquet - split: devtest path: data/eng_Latn-ayr_Latn/devtest.parquet - config_name: azb_Arab data_files: - split: dev path: data/eng_Latn-azb_Arab/dev.parquet - split: devtest path: data/eng_Latn-azb_Arab/devtest.parquet - config_name: azj_Latn data_files: - split: dev path: data/eng_Latn-azj_Latn/dev.parquet - split: devtest path: data/eng_Latn-azj_Latn/devtest.parquet - config_name: bak_Cyrl data_files: - split: dev path: data/eng_Latn-bak_Cyrl/dev.parquet - split: devtest path: data/eng_Latn-bak_Cyrl/devtest.parquet - config_name: bam_Latn data_files: - split: dev path: data/eng_Latn-bam_Latn/dev.parquet - split: devtest path: data/eng_Latn-bam_Latn/devtest.parquet - config_name: ban_Latn data_files: - split: dev path: data/eng_Latn-ban_Latn/dev.parquet - split: devtest path: data/eng_Latn-ban_Latn/devtest.parquet - config_name: bel_Cyrl data_files: - split: dev path: data/eng_Latn-bel_Cyrl/dev.parquet - split: devtest path: data/eng_Latn-bel_Cyrl/devtest.parquet - config_name: bem_Latn data_files: - split: dev path: data/eng_Latn-bem_Latn/dev.parquet - split: devtest path: data/eng_Latn-bem_Latn/devtest.parquet - config_name: ben_Beng data_files: - split: dev path: data/eng_Latn-ben_Beng/dev.parquet - split: devtest path: data/eng_Latn-ben_Beng/devtest.parquet - config_name: bho_Deva data_files: - split: dev path: data/eng_Latn-bho_Deva/dev.parquet - split: devtest path: data/eng_Latn-bho_Deva/devtest.parquet - config_name: bjn_Arab data_files: - split: dev path: data/eng_Latn-bjn_Arab/dev.parquet - split: devtest path: data/eng_Latn-bjn_Arab/devtest.parquet - config_name: bjn_Latn data_files: - split: dev path: data/eng_Latn-bjn_Latn/dev.parquet - split: devtest path: data/eng_Latn-bjn_Latn/devtest.parquet - config_name: bod_Tibt data_files: - split: dev path: data/eng_Latn-bod_Tibt/dev.parquet - split: devtest path: data/eng_Latn-bod_Tibt/devtest.parquet - config_name: bos_Latn data_files: - split: dev path: data/eng_Latn-bos_Latn/dev.parquet - split: devtest path: data/eng_Latn-bos_Latn/devtest.parquet - config_name: bug_Latn data_files: - split: dev path: data/eng_Latn-bug_Latn/dev.parquet - split: devtest path: data/eng_Latn-bug_Latn/devtest.parquet - config_name: bul_Cyrl data_files: - split: dev path: data/eng_Latn-bul_Cyrl/dev.parquet - split: devtest path: data/eng_Latn-bul_Cyrl/devtest.parquet - config_name: cat_Latn data_files: - split: dev path: data/eng_Latn-cat_Latn/dev.parquet - split: devtest path: data/eng_Latn-cat_Latn/devtest.parquet - config_name: ceb_Latn data_files: - split: dev path: data/eng_Latn-ceb_Latn/dev.parquet - split: devtest path: data/eng_Latn-ceb_Latn/devtest.parquet - config_name: ces_Latn data_files: - split: dev path: data/eng_Latn-ces_Latn/dev.parquet - split: devtest path: data/eng_Latn-ces_Latn/devtest.parquet - config_name: cjk_Latn data_files: - split: dev path: data/eng_Latn-cjk_Latn/dev.parquet - split: devtest path: data/eng_Latn-cjk_Latn/devtest.parquet - config_name: ckb_Arab data_files: - split: dev path: data/eng_Latn-ckb_Arab/dev.parquet - split: devtest path: data/eng_Latn-ckb_Arab/devtest.parquet - config_name: crh_Latn data_files: - split: dev path: data/eng_Latn-crh_Latn/dev.parquet - split: devtest path: data/eng_Latn-crh_Latn/devtest.parquet - config_name: cym_Latn data_files: - split: dev path: data/eng_Latn-cym_Latn/dev.parquet - split: devtest path: data/eng_Latn-cym_Latn/devtest.parquet - config_name: dan_Latn data_files: - split: dev path: data/eng_Latn-dan_Latn/dev.parquet - split: devtest path: data/eng_Latn-dan_Latn/devtest.parquet - config_name: deu_Latn data_files: - split: dev path: data/eng_Latn-deu_Latn/dev.parquet - split: devtest path: data/eng_Latn-deu_Latn/devtest.parquet - config_name: dik_Latn data_files: - split: dev path: data/eng_Latn-dik_Latn/dev.parquet - split: devtest path: data/eng_Latn-dik_Latn/devtest.parquet - config_name: dyu_Latn data_files: - split: dev path: data/eng_Latn-dyu_Latn/dev.parquet - split: devtest path: data/eng_Latn-dyu_Latn/devtest.parquet - config_name: dzo_Tibt data_files: - split: dev path: data/eng_Latn-dzo_Tibt/dev.parquet - split: devtest path: data/eng_Latn-dzo_Tibt/devtest.parquet - config_name: ell_Grek data_files: - split: dev path: data/eng_Latn-ell_Grek/dev.parquet - split: devtest path: data/eng_Latn-ell_Grek/devtest.parquet - config_name: epo_Latn data_files: - split: dev path: data/eng_Latn-epo_Latn/dev.parquet - split: devtest path: data/eng_Latn-epo_Latn/devtest.parquet - config_name: est_Latn data_files: - split: dev path: data/eng_Latn-est_Latn/dev.parquet - split: devtest path: data/eng_Latn-est_Latn/devtest.parquet - config_name: eus_Latn data_files: - split: dev path: data/eng_Latn-eus_Latn/dev.parquet - split: devtest path: data/eng_Latn-eus_Latn/devtest.parquet - config_name: ewe_Latn data_files: - split: dev path: data/eng_Latn-ewe_Latn/dev.parquet - split: devtest path: data/eng_Latn-ewe_Latn/devtest.parquet - config_name: fao_Latn data_files: - split: dev path: data/eng_Latn-fao_Latn/dev.parquet - split: devtest path: data/eng_Latn-fao_Latn/devtest.parquet - config_name: fij_Latn data_files: - split: dev path: data/eng_Latn-fij_Latn/dev.parquet - split: devtest path: data/eng_Latn-fij_Latn/devtest.parquet - config_name: fin_Latn data_files: - split: dev path: data/eng_Latn-fin_Latn/dev.parquet - split: devtest path: data/eng_Latn-fin_Latn/devtest.parquet - config_name: fon_Latn data_files: - split: dev path: data/eng_Latn-fon_Latn/dev.parquet - split: devtest path: data/eng_Latn-fon_Latn/devtest.parquet - config_name: fra_Latn data_files: - split: dev path: data/eng_Latn-fra_Latn/dev.parquet - split: devtest path: data/eng_Latn-fra_Latn/devtest.parquet - config_name: fur_Latn data_files: - split: dev path: data/eng_Latn-fur_Latn/dev.parquet - split: devtest path: data/eng_Latn-fur_Latn/devtest.parquet - config_name: fuv_Latn data_files: - split: dev path: data/eng_Latn-fuv_Latn/dev.parquet - split: devtest path: data/eng_Latn-fuv_Latn/devtest.parquet - config_name: gaz_Latn data_files: - split: dev path: data/eng_Latn-gaz_Latn/dev.parquet - split: devtest path: data/eng_Latn-gaz_Latn/devtest.parquet - config_name: gla_Latn data_files: - split: dev path: data/eng_Latn-gla_Latn/dev.parquet - split: devtest path: data/eng_Latn-gla_Latn/devtest.parquet - config_name: gle_Latn data_files: - split: dev path: data/eng_Latn-gle_Latn/dev.parquet - split: devtest path: data/eng_Latn-gle_Latn/devtest.parquet - config_name: glg_Latn data_files: - split: dev path: data/eng_Latn-glg_Latn/dev.parquet - split: devtest path: data/eng_Latn-glg_Latn/devtest.parquet - config_name: grn_Latn data_files: - split: dev path: data/eng_Latn-grn_Latn/dev.parquet - split: devtest path: data/eng_Latn-grn_Latn/devtest.parquet - config_name: guj_Gujr data_files: - split: dev path: data/eng_Latn-guj_Gujr/dev.parquet - split: devtest path: data/eng_Latn-guj_Gujr/devtest.parquet - config_name: hat_Latn data_files: - split: dev path: data/eng_Latn-hat_Latn/dev.parquet - split: devtest path: data/eng_Latn-hat_Latn/devtest.parquet - config_name: hau_Latn data_files: - split: dev path: data/eng_Latn-hau_Latn/dev.parquet - split: devtest path: data/eng_Latn-hau_Latn/devtest.parquet - config_name: heb_Hebr data_files: - split: dev path: data/eng_Latn-heb_Hebr/dev.parquet - split: devtest path: data/eng_Latn-heb_Hebr/devtest.parquet - config_name: hin_Deva data_files: - split: dev path: data/eng_Latn-hin_Deva/dev.parquet - split: devtest path: data/eng_Latn-hin_Deva/devtest.parquet - config_name: hne_Deva data_files: - split: dev path: data/eng_Latn-hne_Deva/dev.parquet - split: devtest path: data/eng_Latn-hne_Deva/devtest.parquet - config_name: hrv_Latn data_files: - split: dev path: data/eng_Latn-hrv_Latn/dev.parquet - split: devtest path: data/eng_Latn-hrv_Latn/devtest.parquet - config_name: hun_Latn data_files: - split: dev path: data/eng_Latn-hun_Latn/dev.parquet - split: devtest path: data/eng_Latn-hun_Latn/devtest.parquet - config_name: hye_Armn data_files: - split: dev path: data/eng_Latn-hye_Armn/dev.parquet - split: devtest path: data/eng_Latn-hye_Armn/devtest.parquet - config_name: ibo_Latn data_files: - split: dev path: data/eng_Latn-ibo_Latn/dev.parquet - split: devtest path: data/eng_Latn-ibo_Latn/devtest.parquet - config_name: ilo_Latn data_files: - split: dev path: data/eng_Latn-ilo_Latn/dev.parquet - split: devtest path: data/eng_Latn-ilo_Latn/devtest.parquet - config_name: ind_Latn data_files: - split: dev path: data/eng_Latn-ind_Latn/dev.parquet - split: devtest path: data/eng_Latn-ind_Latn/devtest.parquet - config_name: isl_Latn data_files: - split: dev path: data/eng_Latn-isl_Latn/dev.parquet - split: devtest path: data/eng_Latn-isl_Latn/devtest.parquet - config_name: ita_Latn data_files: - split: dev path: data/eng_Latn-ita_Latn/dev.parquet - split: devtest path: data/eng_Latn-ita_Latn/devtest.parquet - config_name: jav_Latn data_files: - split: dev path: data/eng_Latn-jav_Latn/dev.parquet - split: devtest path: data/eng_Latn-jav_Latn/devtest.parquet - config_name: jpn_Jpan data_files: - split: dev path: data/eng_Latn-jpn_Jpan/dev.parquet - split: devtest path: data/eng_Latn-jpn_Jpan/devtest.parquet - config_name: kab_Latn data_files: - split: dev path: data/eng_Latn-kab_Latn/dev.parquet - split: devtest path: data/eng_Latn-kab_Latn/devtest.parquet - config_name: kac_Latn data_files: - split: dev path: data/eng_Latn-kac_Latn/dev.parquet - split: devtest path: data/eng_Latn-kac_Latn/devtest.parquet - config_name: kam_Latn data_files: - split: dev path: data/eng_Latn-kam_Latn/dev.parquet - split: devtest path: data/eng_Latn-kam_Latn/devtest.parquet - config_name: kan_Knda data_files: - split: dev path: data/eng_Latn-kan_Knda/dev.parquet - split: devtest path: data/eng_Latn-kan_Knda/devtest.parquet - config_name: kas_Arab data_files: - split: dev path: data/eng_Latn-kas_Arab/dev.parquet - split: devtest path: data/eng_Latn-kas_Arab/devtest.parquet - config_name: kas_Deva data_files: - split: dev path: data/eng_Latn-kas_Deva/dev.parquet - split: devtest path: data/eng_Latn-kas_Deva/devtest.parquet - config_name: kat_Geor data_files: - split: dev path: data/eng_Latn-kat_Geor/dev.parquet - split: devtest path: data/eng_Latn-kat_Geor/devtest.parquet - config_name: kaz_Cyrl data_files: - split: dev path: data/eng_Latn-kaz_Cyrl/dev.parquet - split: devtest path: data/eng_Latn-kaz_Cyrl/devtest.parquet - config_name: kbp_Latn data_files: - split: dev path: data/eng_Latn-kbp_Latn/dev.parquet - split: devtest path: data/eng_Latn-kbp_Latn/devtest.parquet - config_name: kea_Latn data_files: - split: dev path: data/eng_Latn-kea_Latn/dev.parquet - split: devtest path: data/eng_Latn-kea_Latn/devtest.parquet - config_name: khk_Cyrl data_files: - split: dev path: data/eng_Latn-khk_Cyrl/dev.parquet - split: devtest path: data/eng_Latn-khk_Cyrl/devtest.parquet - config_name: khm_Khmr data_files: - split: dev path: data/eng_Latn-khm_Khmr/dev.parquet - split: devtest path: data/eng_Latn-khm_Khmr/devtest.parquet - config_name: kik_Latn data_files: - split: dev path: data/eng_Latn-kik_Latn/dev.parquet - split: devtest path: data/eng_Latn-kik_Latn/devtest.parquet - config_name: kin_Latn data_files: - split: dev path: data/eng_Latn-kin_Latn/dev.parquet - split: devtest path: data/eng_Latn-kin_Latn/devtest.parquet - config_name: kir_Cyrl data_files: - split: dev path: data/eng_Latn-kir_Cyrl/dev.parquet - split: devtest path: data/eng_Latn-kir_Cyrl/devtest.parquet - config_name: kmb_Latn data_files: - split: dev path: data/eng_Latn-kmb_Latn/dev.parquet - split: devtest path: data/eng_Latn-kmb_Latn/devtest.parquet - config_name: kmr_Latn data_files: - split: dev path: data/eng_Latn-kmr_Latn/dev.parquet - split: devtest path: data/eng_Latn-kmr_Latn/devtest.parquet - config_name: knc_Arab data_files: - split: dev path: data/eng_Latn-knc_Arab/dev.parquet - split: devtest path: data/eng_Latn-knc_Arab/devtest.parquet - config_name: knc_Latn data_files: - split: dev path: data/eng_Latn-knc_Latn/dev.parquet - split: devtest path: data/eng_Latn-knc_Latn/devtest.parquet - config_name: kon_Latn data_files: - split: dev path: data/eng_Latn-kon_Latn/dev.parquet - split: devtest path: data/eng_Latn-kon_Latn/devtest.parquet - config_name: kor_Hang data_files: - split: dev path: data/eng_Latn-kor_Hang/dev.parquet - split: devtest path: data/eng_Latn-kor_Hang/devtest.parquet - config_name: lao_Laoo data_files: - split: dev path: data/eng_Latn-lao_Laoo/dev.parquet - split: devtest path: data/eng_Latn-lao_Laoo/devtest.parquet - config_name: lij_Latn data_files: - split: dev path: data/eng_Latn-lij_Latn/dev.parquet - split: devtest path: data/eng_Latn-lij_Latn/devtest.parquet - config_name: lim_Latn data_files: - split: dev path: data/eng_Latn-lim_Latn/dev.parquet - split: devtest path: data/eng_Latn-lim_Latn/devtest.parquet - config_name: lin_Latn data_files: - split: dev path: data/eng_Latn-lin_Latn/dev.parquet - split: devtest path: data/eng_Latn-lin_Latn/devtest.parquet - config_name: lit_Latn data_files: - split: dev path: data/eng_Latn-lit_Latn/dev.parquet - split: devtest path: data/eng_Latn-lit_Latn/devtest.parquet - config_name: lmo_Latn data_files: - split: dev path: data/eng_Latn-lmo_Latn/dev.parquet - split: devtest path: data/eng_Latn-lmo_Latn/devtest.parquet - config_name: ltg_Latn data_files: - split: dev path: data/eng_Latn-ltg_Latn/dev.parquet - split: devtest path: data/eng_Latn-ltg_Latn/devtest.parquet - config_name: ltz_Latn data_files: - split: dev path: data/eng_Latn-ltz_Latn/dev.parquet - split: devtest path: data/eng_Latn-ltz_Latn/devtest.parquet - config_name: lua_Latn data_files: - split: dev path: data/eng_Latn-lua_Latn/dev.parquet - split: devtest path: data/eng_Latn-lua_Latn/devtest.parquet - config_name: lug_Latn data_files: - split: dev path: data/eng_Latn-lug_Latn/dev.parquet - split: devtest path: data/eng_Latn-lug_Latn/devtest.parquet - config_name: luo_Latn data_files: - split: dev path: data/eng_Latn-luo_Latn/dev.parquet - split: devtest path: data/eng_Latn-luo_Latn/devtest.parquet - config_name: lus_Latn data_files: - split: dev path: data/eng_Latn-lus_Latn/dev.parquet - split: devtest path: data/eng_Latn-lus_Latn/devtest.parquet - config_name: lvs_Latn data_files: - split: dev path: data/eng_Latn-lvs_Latn/dev.parquet - split: devtest path: data/eng_Latn-lvs_Latn/devtest.parquet - config_name: mag_Deva data_files: - split: dev path: data/eng_Latn-mag_Deva/dev.parquet - split: devtest path: data/eng_Latn-mag_Deva/devtest.parquet - config_name: mai_Deva data_files: - split: dev path: data/eng_Latn-mai_Deva/dev.parquet - split: devtest path: data/eng_Latn-mai_Deva/devtest.parquet - config_name: mal_Mlym data_files: - split: dev path: data/eng_Latn-mal_Mlym/dev.parquet - split: devtest path: data/eng_Latn-mal_Mlym/devtest.parquet - config_name: mar_Deva data_files: - split: dev path: data/eng_Latn-mar_Deva/dev.parquet - split: devtest path: data/eng_Latn-mar_Deva/devtest.parquet - config_name: min_Arab data_files: - split: dev path: data/eng_Latn-min_Arab/dev.parquet - split: devtest path: data/eng_Latn-min_Arab/devtest.parquet - config_name: min_Latn data_files: - split: dev path: data/eng_Latn-min_Latn/dev.parquet - split: devtest path: data/eng_Latn-min_Latn/devtest.parquet - config_name: mkd_Cyrl data_files: - split: dev path: data/eng_Latn-mkd_Cyrl/dev.parquet - split: devtest path: data/eng_Latn-mkd_Cyrl/devtest.parquet - config_name: mlt_Latn data_files: - split: dev path: data/eng_Latn-mlt_Latn/dev.parquet - split: devtest path: data/eng_Latn-mlt_Latn/devtest.parquet - config_name: mni_Beng data_files: - split: dev path: data/eng_Latn-mni_Beng/dev.parquet - split: devtest path: data/eng_Latn-mni_Beng/devtest.parquet - config_name: mos_Latn data_files: - split: dev path: data/eng_Latn-mos_Latn/dev.parquet - split: devtest path: data/eng_Latn-mos_Latn/devtest.parquet - config_name: mri_Latn data_files: - split: dev path: data/eng_Latn-mri_Latn/dev.parquet - split: devtest path: data/eng_Latn-mri_Latn/devtest.parquet - config_name: mya_Mymr data_files: - split: dev path: data/eng_Latn-mya_Mymr/dev.parquet - split: devtest path: data/eng_Latn-mya_Mymr/devtest.parquet - config_name: nld_Latn data_files: - split: dev path: data/eng_Latn-nld_Latn/dev.parquet - split: devtest path: data/eng_Latn-nld_Latn/devtest.parquet - config_name: nno_Latn data_files: - split: dev path: data/eng_Latn-nno_Latn/dev.parquet - split: devtest path: data/eng_Latn-nno_Latn/devtest.parquet - config_name: nob_Latn data_files: - split: dev path: data/eng_Latn-nob_Latn/dev.parquet - split: devtest path: data/eng_Latn-nob_Latn/devtest.parquet - config_name: npi_Deva data_files: - split: dev path: data/eng_Latn-npi_Deva/dev.parquet - split: devtest path: data/eng_Latn-npi_Deva/devtest.parquet - config_name: nso_Latn data_files: - split: dev path: data/eng_Latn-nso_Latn/dev.parquet - split: devtest path: data/eng_Latn-nso_Latn/devtest.parquet - config_name: nus_Latn data_files: - split: dev path: data/eng_Latn-nus_Latn/dev.parquet - split: devtest path: data/eng_Latn-nus_Latn/devtest.parquet - config_name: nya_Latn data_files: - split: dev path: data/eng_Latn-nya_Latn/dev.parquet - split: devtest path: data/eng_Latn-nya_Latn/devtest.parquet - config_name: oci_Latn data_files: - split: dev path: data/eng_Latn-oci_Latn/dev.parquet - split: devtest path: data/eng_Latn-oci_Latn/devtest.parquet - config_name: ory_Orya data_files: - split: dev path: data/eng_Latn-ory_Orya/dev.parquet - split: devtest path: data/eng_Latn-ory_Orya/devtest.parquet - config_name: pag_Latn data_files: - split: dev path: data/eng_Latn-pag_Latn/dev.parquet - split: devtest path: data/eng_Latn-pag_Latn/devtest.parquet - config_name: pan_Guru data_files: - split: dev path: data/eng_Latn-pan_Guru/dev.parquet - split: devtest path: data/eng_Latn-pan_Guru/devtest.parquet - config_name: pap_Latn data_files: - split: dev path: data/eng_Latn-pap_Latn/dev.parquet - split: devtest path: data/eng_Latn-pap_Latn/devtest.parquet - config_name: pbt_Arab data_files: - split: dev path: data/eng_Latn-pbt_Arab/dev.parquet - split: devtest path: data/eng_Latn-pbt_Arab/devtest.parquet - config_name: pes_Arab data_files: - split: dev path: data/eng_Latn-pes_Arab/dev.parquet - split: devtest path: data/eng_Latn-pes_Arab/devtest.parquet - config_name: plt_Latn data_files: - split: dev path: data/eng_Latn-plt_Latn/dev.parquet - split: devtest path: data/eng_Latn-plt_Latn/devtest.parquet - config_name: pol_Latn data_files: - split: dev path: data/eng_Latn-pol_Latn/dev.parquet - split: devtest path: data/eng_Latn-pol_Latn/devtest.parquet - config_name: por_Latn data_files: - split: dev path: data/eng_Latn-por_Latn/dev.parquet - split: devtest path: data/eng_Latn-por_Latn/devtest.parquet - config_name: prs_Arab data_files: - split: dev path: data/eng_Latn-prs_Arab/dev.parquet - split: devtest path: data/eng_Latn-prs_Arab/devtest.parquet - config_name: quy_Latn data_files: - split: dev path: data/eng_Latn-quy_Latn/dev.parquet - split: devtest path: data/eng_Latn-quy_Latn/devtest.parquet - config_name: ron_Latn data_files: - split: dev path: data/eng_Latn-ron_Latn/dev.parquet - split: devtest path: data/eng_Latn-ron_Latn/devtest.parquet - config_name: run_Latn data_files: - split: dev path: data/eng_Latn-run_Latn/dev.parquet - split: devtest path: data/eng_Latn-run_Latn/devtest.parquet - config_name: rus_Cyrl data_files: - split: dev path: data/eng_Latn-rus_Cyrl/dev.parquet - split: devtest path: data/eng_Latn-rus_Cyrl/devtest.parquet - config_name: sag_Latn data_files: - split: dev path: data/eng_Latn-sag_Latn/dev.parquet - split: devtest path: data/eng_Latn-sag_Latn/devtest.parquet - config_name: san_Deva data_files: - split: dev path: data/eng_Latn-san_Deva/dev.parquet - split: devtest path: data/eng_Latn-san_Deva/devtest.parquet - config_name: sat_Olck data_files: - split: dev path: data/eng_Latn-sat_Olck/dev.parquet - split: devtest path: data/eng_Latn-sat_Olck/devtest.parquet - config_name: scn_Latn data_files: - split: dev path: data/eng_Latn-scn_Latn/dev.parquet - split: devtest path: data/eng_Latn-scn_Latn/devtest.parquet - config_name: shn_Mymr data_files: - split: dev path: data/eng_Latn-shn_Mymr/dev.parquet - split: devtest path: data/eng_Latn-shn_Mymr/devtest.parquet - config_name: sin_Sinh data_files: - split: dev path: data/eng_Latn-sin_Sinh/dev.parquet - split: devtest path: data/eng_Latn-sin_Sinh/devtest.parquet - config_name: slk_Latn data_files: - split: dev path: data/eng_Latn-slk_Latn/dev.parquet - split: devtest path: data/eng_Latn-slk_Latn/devtest.parquet - config_name: slv_Latn data_files: - split: dev path: data/eng_Latn-slv_Latn/dev.parquet - split: devtest path: data/eng_Latn-slv_Latn/devtest.parquet - config_name: smo_Latn data_files: - split: dev path: data/eng_Latn-smo_Latn/dev.parquet - split: devtest path: data/eng_Latn-smo_Latn/devtest.parquet - config_name: sna_Latn data_files: - split: dev path: data/eng_Latn-sna_Latn/dev.parquet - split: devtest path: data/eng_Latn-sna_Latn/devtest.parquet - config_name: snd_Arab data_files: - split: dev path: data/eng_Latn-snd_Arab/dev.parquet - split: devtest path: data/eng_Latn-snd_Arab/devtest.parquet - config_name: som_Latn data_files: - split: dev path: data/eng_Latn-som_Latn/dev.parquet - split: devtest path: data/eng_Latn-som_Latn/devtest.parquet - config_name: sot_Latn data_files: - split: dev path: data/eng_Latn-sot_Latn/dev.parquet - split: devtest path: data/eng_Latn-sot_Latn/devtest.parquet - config_name: spa_Latn data_files: - split: dev path: data/eng_Latn-spa_Latn/dev.parquet - split: devtest path: data/eng_Latn-spa_Latn/devtest.parquet - config_name: srd_Latn data_files: - split: dev path: data/eng_Latn-srd_Latn/dev.parquet - split: devtest path: data/eng_Latn-srd_Latn/devtest.parquet - config_name: srp_Cyrl data_files: - split: dev path: data/eng_Latn-srp_Cyrl/dev.parquet - split: devtest path: data/eng_Latn-srp_Cyrl/devtest.parquet - config_name: ssw_Latn data_files: - split: dev path: data/eng_Latn-ssw_Latn/dev.parquet - split: devtest path: data/eng_Latn-ssw_Latn/devtest.parquet - config_name: sun_Latn data_files: - split: dev path: data/eng_Latn-sun_Latn/dev.parquet - split: devtest path: data/eng_Latn-sun_Latn/devtest.parquet - config_name: swe_Latn data_files: - split: dev path: data/eng_Latn-swe_Latn/dev.parquet - split: devtest path: data/eng_Latn-swe_Latn/devtest.parquet - config_name: swh_Latn data_files: - split: dev path: data/eng_Latn-swh_Latn/dev.parquet - split: devtest path: data/eng_Latn-swh_Latn/devtest.parquet - config_name: szl_Latn data_files: - split: dev path: data/eng_Latn-szl_Latn/dev.parquet - split: devtest path: data/eng_Latn-szl_Latn/devtest.parquet - config_name: tam_Taml data_files: - split: dev path: data/eng_Latn-tam_Taml/dev.parquet - split: devtest path: data/eng_Latn-tam_Taml/devtest.parquet - config_name: taq_Latn data_files: - split: dev path: data/eng_Latn-taq_Latn/dev.parquet - split: devtest path: data/eng_Latn-taq_Latn/devtest.parquet - config_name: taq_Tfng data_files: - split: dev path: data/eng_Latn-taq_Tfng/dev.parquet - split: devtest path: data/eng_Latn-taq_Tfng/devtest.parquet - config_name: tat_Cyrl data_files: - split: dev path: data/eng_Latn-tat_Cyrl/dev.parquet - split: devtest path: data/eng_Latn-tat_Cyrl/devtest.parquet - config_name: tel_Telu data_files: - split: dev path: data/eng_Latn-tel_Telu/dev.parquet - split: devtest path: data/eng_Latn-tel_Telu/devtest.parquet - config_name: tgk_Cyrl data_files: - split: dev path: data/eng_Latn-tgk_Cyrl/dev.parquet - split: devtest path: data/eng_Latn-tgk_Cyrl/devtest.parquet - config_name: tgl_Latn data_files: - split: dev path: data/eng_Latn-tgl_Latn/dev.parquet - split: devtest path: data/eng_Latn-tgl_Latn/devtest.parquet - config_name: tha_Thai data_files: - split: dev path: data/eng_Latn-tha_Thai/dev.parquet - split: devtest path: data/eng_Latn-tha_Thai/devtest.parquet - config_name: tir_Ethi data_files: - split: dev path: data/eng_Latn-tir_Ethi/dev.parquet - split: devtest path: data/eng_Latn-tir_Ethi/devtest.parquet - config_name: tpi_Latn data_files: - split: dev path: data/eng_Latn-tpi_Latn/dev.parquet - split: devtest path: data/eng_Latn-tpi_Latn/devtest.parquet - config_name: tsn_Latn data_files: - split: dev path: data/eng_Latn-tsn_Latn/dev.parquet - split: devtest path: data/eng_Latn-tsn_Latn/devtest.parquet - config_name: tso_Latn data_files: - split: dev path: data/eng_Latn-tso_Latn/dev.parquet - split: devtest path: data/eng_Latn-tso_Latn/devtest.parquet - config_name: tuk_Latn data_files: - split: dev path: data/eng_Latn-tuk_Latn/dev.parquet - split: devtest path: data/eng_Latn-tuk_Latn/devtest.parquet - config_name: tum_Latn data_files: - split: dev path: data/eng_Latn-tum_Latn/dev.parquet - split: devtest path: data/eng_Latn-tum_Latn/devtest.parquet - config_name: tur_Latn data_files: - split: dev path: data/eng_Latn-tur_Latn/dev.parquet - split: devtest path: data/eng_Latn-tur_Latn/devtest.parquet - config_name: twi_Latn data_files: - split: dev path: data/eng_Latn-twi_Latn/dev.parquet - split: devtest path: data/eng_Latn-twi_Latn/devtest.parquet - config_name: tzm_Tfng data_files: - split: dev path: data/eng_Latn-tzm_Tfng/dev.parquet - split: devtest path: data/eng_Latn-tzm_Tfng/devtest.parquet - config_name: uig_Arab data_files: - split: dev path: data/eng_Latn-uig_Arab/dev.parquet - split: devtest path: data/eng_Latn-uig_Arab/devtest.parquet - config_name: ukr_Cyrl data_files: - split: dev path: data/eng_Latn-ukr_Cyrl/dev.parquet - split: devtest path: data/eng_Latn-ukr_Cyrl/devtest.parquet - config_name: umb_Latn data_files: - split: dev path: data/eng_Latn-umb_Latn/dev.parquet - split: devtest path: data/eng_Latn-umb_Latn/devtest.parquet - config_name: urd_Arab data_files: - split: dev path: data/eng_Latn-urd_Arab/dev.parquet - split: devtest path: data/eng_Latn-urd_Arab/devtest.parquet - config_name: uzn_Latn data_files: - split: dev path: data/eng_Latn-uzn_Latn/dev.parquet - split: devtest path: data/eng_Latn-uzn_Latn/devtest.parquet - config_name: vec_Latn data_files: - split: dev path: data/eng_Latn-vec_Latn/dev.parquet - split: devtest path: data/eng_Latn-vec_Latn/devtest.parquet - config_name: vie_Latn data_files: - split: dev path: data/eng_Latn-vie_Latn/dev.parquet - split: devtest path: data/eng_Latn-vie_Latn/devtest.parquet - config_name: war_Latn data_files: - split: dev path: data/eng_Latn-war_Latn/dev.parquet - split: devtest path: data/eng_Latn-war_Latn/devtest.parquet - config_name: wol_Latn data_files: - split: dev path: data/eng_Latn-wol_Latn/dev.parquet - split: devtest path: data/eng_Latn-wol_Latn/devtest.parquet - config_name: xho_Latn data_files: - split: dev path: data/eng_Latn-xho_Latn/dev.parquet - split: devtest path: data/eng_Latn-xho_Latn/devtest.parquet - config_name: ydd_Hebr data_files: - split: dev path: data/eng_Latn-ydd_Hebr/dev.parquet - split: devtest path: data/eng_Latn-ydd_Hebr/devtest.parquet - config_name: yor_Latn data_files: - split: dev path: data/eng_Latn-yor_Latn/dev.parquet - split: devtest path: data/eng_Latn-yor_Latn/devtest.parquet - config_name: yue_Hant data_files: - split: dev path: data/eng_Latn-yue_Hant/dev.parquet - split: devtest path: data/eng_Latn-yue_Hant/devtest.parquet - config_name: zho_Hans data_files: - split: dev path: data/eng_Latn-zho_Hans/dev.parquet - split: devtest path: data/eng_Latn-zho_Hans/devtest.parquet - config_name: zho_Hant data_files: - split: dev path: data/eng_Latn-zho_Hant/dev.parquet - split: devtest path: data/eng_Latn-zho_Hant/devtest.parquet - config_name: zsm_Latn data_files: - split: dev path: data/eng_Latn-zsm_Latn/dev.parquet - split: devtest path: data/eng_Latn-zsm_Latn/devtest.parquet - config_name: zul_Latn data_files: - split: dev path: data/eng_Latn-zul_Latn/dev.parquet - split: devtest path: data/eng_Latn-zul_Latn/devtest.parquet --- xSIM++ is an extension of [xSIM](https://github.com/facebookresearch/LASER/tree/main/tasks/xsim). In comparison to xSIM, this evaluates using target-side data with additional synthetic, hard-to-distinguish examples. You can find more details about it in the publication: [xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages](https://arxiv.org/abs/2306.12907).
提供机构:
jaygala24
原始信息汇总

数据集概述

基本信息

  • 数据集名称: xsimplusplus
  • 多语言支持: 多语言
  • 数据集大小: 1K<n<400K

语言支持

数据集支持多种语言,包括但不限于:

  • ace, acm, acq, aeb, afr, ajp, aka, als, amh, apc, arb, asm, ast, awa, ayr, azb, azj, bak, bam, ban, bel, bem, ben, bho, bjn, bod, bos, bug, bul, cat, ceb, ces, cjk, ckb, crh, cym, dan, deu, dik, dyu, dzo, ell, eng, epo, est, eus, ewe, fao, fij, fin, fon, fra, fur, fuv, gaz, gla, gle, glg, grn, guj, hat, hau, heb, hin, hne, hrv, hun, hye, ibo, ilo, ind, isl, ita, jav, jpn, kab, kac, kam, kan, kas, kat, kaz, kbp, kea, khk, khm, kik, kin, kir, kmb, kmr, knc, kon, kor, lao, lij, lim, lin, lit, lmo, ltg, ltz, lua, lug, luo, lus, lvs, mag, mai, mal, mar, min, mkd, mlt, mni, mos, mri, mya, nld, nno, nob, npi, nso, nus, nya, oci, ory, pag, pan, pap, pbt, pes, plt, pol, por, prs, quy, ron, run, rus, sag, san, sat, scn, shn, sin, slk, slv, smo, sna, snd, som, sot, spa, srd, srp, ssw, sun, swe, swh, szl, tam, taq, tat, tel, tgk, tgl, tha, tir, tpi, tsn, tso, tuk, tum, tur, twi, tzm, uig, ukr, umb, urd, uzn, vec, vie, war, wol, xho, ydd, yor, yue, zho, zsm, zul

许可证

  • 许可证类型: cc0-1.0, other

配置文件

数据集包含多个配置文件,每个配置文件对应不同的语言和文件路径,例如:

  • 配置名称: default
    • 数据文件:
      • split: dev, path: data/default/dev.parquet
      • split: devtest, path: data/default/devtest.parquet
  • 配置名称: ace_Arab
    • 数据文件:
      • split: dev, path: data/eng_Latn-ace_Arab/dev.parquet
      • split: devtest, path: data/eng_Latn-ace_Arab/devtest.parquet
  • 配置名称: ace_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-ace_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-ace_Latn/devtest.parquet
  • 配置名称: acm_Arab
    • 数据文件:
      • split: dev, path: data/eng_Latn-acm_Arab/dev.parquet
      • split: devtest, path: data/eng_Latn-acm_Arab/devtest.parquet
  • 配置名称: acq_Arab
    • 数据文件:
      • split: dev, path: data/eng_Latn-acq_Arab/dev.parquet
      • split: devtest, path: data/eng_Latn-acq_Arab/devtest.parquet
  • 配置名称: aeb_Arab
    • 数据文件:
      • split: dev, path: data/eng_Latn-aeb_Arab/dev.parquet
      • split: devtest, path: data/eng_Latn-aeb_Arab/devtest.parquet
  • 配置名称: afr_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-afr_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-afr_Latn/devtest.parquet
  • 配置名称: ajp_Arab
    • 数据文件:
      • split: dev, path: data/eng_Latn-ajp_Arab/dev.parquet
      • split: devtest, path: data/eng_Latn-ajp_Arab/devtest.parquet
  • 配置名称: aka_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-aka_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-aka_Latn/devtest.parquet
  • 配置名称: als_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-als_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-als_Latn/devtest.parquet
  • 配置名称: amh_Ethi
    • 数据文件:
      • split: dev, path: data/eng_Latn-amh_Ethi/dev.parquet
      • split: devtest, path: data/eng_Latn-amh_Ethi/devtest.parquet
  • 配置名称: apc_Arab
    • 数据文件:
      • split: dev, path: data/eng_Latn-apc_Arab/dev.parquet
      • split: devtest, path: data/eng_Latn-apc_Arab/devtest.parquet
  • 配置名称: arb_Arab
    • 数据文件:
      • split: dev, path: data/eng_Latn-arb_Arab/dev.parquet
      • split: devtest, path: data/eng_Latn-arb_Arab/devtest.parquet
  • 配置名称: arb_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-arb_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-arb_Latn/devtest.parquet
  • 配置名称: ars_Arab
    • 数据文件:
      • split: dev, path: data/eng_Latn-ars_Arab/dev.parquet
      • split: devtest, path: data/eng_Latn-ars_Arab/devtest.parquet
  • 配置名称: ary_Arab
    • 数据文件:
      • split: dev, path: data/eng_Latn-ary_Arab/dev.parquet
      • split: devtest, path: data/eng_Latn-ary_Arab/devtest.parquet
  • 配置名称: arz_Arab
    • 数据文件:
      • split: dev, path: data/eng_Latn-arz_Arab/dev.parquet
      • split: devtest, path: data/eng_Latn-arz_Arab/devtest.parquet
  • 配置名称: asm_Beng
    • 数据文件:
      • split: dev, path: data/eng_Latn-asm_Beng/dev.parquet
      • split: devtest, path: data/eng_Latn-asm_Beng/devtest.parquet
  • 配置名称: ast_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-ast_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-ast_Latn/devtest.parquet
  • 配置名称: awa_Deva
    • 数据文件:
      • split: dev, path: data/eng_Latn-awa_Deva/dev.parquet
      • split: devtest, path: data/eng_Latn-awa_Deva/devtest.parquet
  • 配置名称: ayr_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-ayr_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-ayr_Latn/devtest.parquet
  • 配置名称: azb_Arab
    • 数据文件:
      • split: dev, path: data/eng_Latn-azb_Arab/dev.parquet
      • split: devtest, path: data/eng_Latn-azb_Arab/devtest.parquet
  • 配置名称: azj_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-azj_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-azj_Latn/devtest.parquet
  • 配置名称: bak_Cyrl
    • 数据文件:
      • split: dev, path: data/eng_Latn-bak_Cyrl/dev.parquet
      • split: devtest, path: data/eng_Latn-bak_Cyrl/devtest.parquet
  • 配置名称: bam_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-bam_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-bam_Latn/devtest.parquet
  • 配置名称: ban_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-ban_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-ban_Latn/devtest.parquet
  • 配置名称: bel_Cyrl
    • 数据文件:
      • split: dev, path: data/eng_Latn-bel_Cyrl/dev.parquet
      • split: devtest, path: data/eng_Latn-bel_Cyrl/devtest.parquet
  • 配置名称: bem_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-bem_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-bem_Latn/devtest.parquet
  • 配置名称: ben_Beng
    • 数据文件:
      • split: dev, path: data/eng_Latn-ben_Beng/dev.parquet
      • split: devtest, path: data/eng_Latn-ben_Beng/devtest.parquet
  • 配置名称: bho_Deva
    • 数据文件:
      • split: dev, path: data/eng_Latn-bho_Deva/dev.parquet
      • split: devtest, path: data/eng_Latn-bho_Deva/devtest.parquet
  • 配置名称: bjn_Arab
    • 数据文件:
      • split: dev, path: data/eng_Latn-bjn_Arab/dev.parquet
      • split: devtest, path: data/eng_Latn-bjn_Arab/devtest.parquet
  • 配置名称: bjn_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-bjn_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-bjn_Latn/devtest.parquet
  • 配置名称: bod_Tibt
    • 数据文件:
      • split: dev, path: data/eng_Latn-bod_Tibt/dev.parquet
      • split: devtest, path: data/eng_Latn-bod_Tibt/devtest.parquet
  • 配置名称: bos_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-bos_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-bos_Latn/devtest.parquet
  • 配置名称: bug_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-bug_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-bug_Latn/devtest.parquet
  • 配置名称: bul_Cyrl
    • 数据文件:
      • split: dev, path: data/eng_Latn-bul_Cyrl/dev.parquet
      • split: devtest, path: data/eng_Latn-bul_Cyrl/devtest.parquet
  • 配置名称: cat_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-cat_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-cat_Latn/devtest.parquet
  • 配置名称: ceb_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-ceb_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-ceb_Latn/devtest.parquet
  • 配置名称: ces_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-ces_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-ces_Latn/devtest.parquet
  • 配置名称: cjk_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-cjk_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-cjk_Latn/devtest.parquet
  • 配置名称: ckb_Arab
    • 数据文件:
      • split: dev, path: data/eng_Latn-ckb_Arab/dev.parquet
      • split: devtest, path: data/eng_Latn-ckb_Arab/devtest.parquet
  • 配置名称: crh_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-crh_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-crh_Latn/devtest.parquet
  • 配置名称: cym_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-cym_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-cym_Latn/devtest.parquet
  • 配置名称: dan_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-dan_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-dan_Latn/devtest.parquet
  • 配置名称: deu_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-deu_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-deu_Latn/devtest.parquet
  • 配置名称: dik_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-dik_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-dik_Latn/devtest.parquet
  • 配置名称: dyu_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-dyu_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-dyu_Latn/devtest.parquet
  • 配置名称: dzo_Tibt
    • 数据文件:
      • split: dev, path: data/eng_Latn-dzo_Tibt/dev.parquet
      • split: devtest, path: data/eng_Latn-dzo_Tibt/devtest.parquet
  • 配置名称: ell_Grek
    • 数据文件:
      • split: dev, path: data/eng_Latn-ell_Grek/dev.parquet
      • split: devtest, path: data/eng_Latn-ell_Grek/devtest.parquet
  • 配置名称: epo_Latn
    • 数据文件:
      • split: dev, path: data/eng_Latn-epo_Latn/dev.parquet
      • split: devtest, path: data/eng_Latn-epo_Latn/devtest.parquet
  • 配置名称: est_Latn
    • 数据文件:

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作