five

belebele

收藏
魔搭社区2026-01-06 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/facebook/belebele
下载链接
链接失效反馈
官方服务:
资源简介:
# The Belebele Benchmark for Massively Multilingual NLU Evaluation Belebele is a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. This dataset enables the evaluation of mono- and multi-lingual models in high-, medium-, and low-resource languages. Each question has four multiple-choice answers and is linked to a short passage from the [FLORES-200](https://github.com/facebookresearch/flores/tree/main/flores200) dataset. The human annotation procedure was carefully curated to create questions that discriminate between different levels of generalizable language comprehension and is reinforced by extensive quality checks. While all questions directly relate to the passage, the English dataset on its own proves difficult enough to challenge state-of-the-art language models. Being fully parallel, this dataset enables direct comparison of model performance across all languages. Belebele opens up new avenues for evaluating and analyzing the multilingual abilities of language models and NLP systems. Please refer to our paper for more details, presented at ACL 2024: [The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants](https://ai.meta.com/research/publications/the-belebele-benchmark-a-parallel-reading-comprehension-dataset-in-122-language-variants/). Or get more details at https://github.com/facebookresearch/belebele ## Citation If you use this data in your work, please cite: ```bibtex @inproceedings{bandarkar-etal-2024-belebele, title = "The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants", author = "Bandarkar, Lucas and Liang, Davis and Muller, Benjamin and Artetxe, Mikel and Shukla, Satya Narayan and Husa, Donald and Goyal, Naman and Krishnan, Abhinandan and Zettlemoyer, Luke and Khabsa, Madian", booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = aug, year = "2024", address = "Bangkok, Thailand and virtual meeting", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.acl-long.44", pages = "749--775", } ``` ## Composition - 900 questions per language variant - 488 distinct passages, there are 1-2 associated questions for each. - For each question, there is 4 multiple-choice answers, exactly 1 of which is correct. - 122 language/language variants (including English). - 900 x 122 = 109,800 total questions. ## Further Stats - 122 language variants, but 115 distinct languages (ignoring scripts) - 27 language families - 29 scripts - Avg. words per passage = 79.1 (std = 26.2) - Avg. sentences per passage = 4.1 (std = 1.4) - Avg. words per question = 12.9(std = 4.0) - Avg. words per answer = 4.2 (std = 2.9) ## Pausible Evaluation Settings Thanks to the parallel nature of the dataset and the simplicity of the task, there are many possible settings in which we can evaluate language models. In all evaluation settings, the metric of interest is simple accuracy (# correct / total). Evaluating models on Belebele in English can be done via finetuning, few-shot, or zero-shot. For other target languages, we propose the incomprehensive list of evaluation settings below. Settings that are compatible with evaluating non-English models (monolingual or cross-lingual) are denoted with `^`. #### No finetuning - **Zero-shot with natural language instructions (English instructions)** - For chat-finetuned models, we give it English instructions for the task and the sample in the target language in the same input. - For our experiments, we instruct the model to provide the letter `A`, `B`, `C`, or `D`. We perform post-processing steps and accept answers predicted as e.g. `(A)` instead of `A`. We sometimes additionally remove the prefix `The correct answer is` for predictions that do not start with one of the four accepted answers. - Sample instructions can be found at the [dataset github repo](https://github.com/facebookresearch/belebele). - **Zero-shot with natural language instructions (translated instructions)** ^ - Same as above, except the instructions are translated to the target language so that the instructions and samples are in the same language. The instructions can be human or machine-translated. - **Few-shot in-context learning (English examples)** - A few samples (e.g. 5) are taken from the English training set (see below) and prompted to the model. Then, the model is evaluated with the same template but with the passages, questions, and answers in the target language. - For our experiments, we use the template: ```P: <passage> \n Q: <question> \n A: <mc answer 1> \n B: <mc answer 2> \n C: <mc answer 3> \n D: <mc answer 4> \n Answer: <Correct answer letter>```. We perform prediction by picking the answer within `[A, B, C, D]` that has the highest probability relatively to the others. - **Few-shot in-context learning (translated examples)** ^ - Same as above, except the samples from the training set are translated to the target language so that the examples and evaluation data are in the same language. The training samples can be human or machine-translated. #### With finetuning - **English finetune & multilingual evaluation** - The model is finetuned to the task using the English training set, probably with a sequence classification head. Then the model is evaluated in all the target languages individually. For results presented in the paper we used [the HuggingFace library](https://huggingface.co/docs/transformers/en/model_doc/xlm-roberta#transformers.XLMRobertaForMultipleChoice). - **English finetune & cross-lingual evaluation** - Same as above, except the model is evaluated in a cross-lingual setting, where for each question, the passage & answers could be provided in a different language. For example, passage could be in language `x`, question in language `y`, and answers in language `z`. - **Translate-train** ^ - For each target language, the model is individually finetuned on training samples that have been machine-translated from English to that language. Each model is then evaluated in the respective target language. - **Translate-train-all** - Similar to above, except here the model is trained on translated samples from all target languages at once. The single finetuned model is then evaluated on all target languages. - **Translate-train-all & cross-lingual evaluation** - Same as above, except the single finetuned model is evaluated in a cross-lingual setting, where for each question, the passage & answers could be provided in a different language. - **Translate-test** - The model is finetuned using the English training data and then the evaluation dataset is machine-translated to English and evaluated on the English. - This setting is primarily a reflection of the quality of the machine translation system, but is useful for comparison to multilingual models. In addition, there are 83 additional languages in FLORES-200 for which questions were not translated for Belebele. Since the passages exist in those target languages, machine-translating the questions & answers may enable decent evaluation of machine reading comprehension in those languages. ## Training Set As discussed in the paper, we also provide an assembled training set consisting of samples at the [github repo](https://github.com/facebookresearch/belebele). The Belebele dataset is intended to be used only as a test set, and not for training or validation. Therefore, for models that require additional task-specific training, we instead propose using an assembled training set consisting of samples from pre-existing multiple-choice QA datasets in English. We considered diverse datasets, and determine the most compatible to be [RACE](https://www.cs.cmu.edu/~glai1/data/race/), [SciQ](https://allenai.org/data/sciq), [MultiRC](https://cogcomp.seas.upenn.edu/multirc/), [MCTest](https://mattr1.github.io/mctest/), [MCScript2.0](https://aclanthology.org/S19-1012/), and [ReClor](https://whyu.me/reclor/). For each of the six datasets, we unpack and restructure the passages and questions from their respective formats. We then filter out less suitable samples (e.g. questions with multiple correct answers). In the end, the dataset comprises 67.5k training samples and 3.7k development samples, more than half of which are from RACE. We provide a script (`assemble_training_set.py`) to reconstruct this dataset for anyone to perform task finetuning. Since the training set is a joint sample of other datasets, it is governed by a different license. We do not claim any of that work or datasets to be our own. See the Licenses section in the README of https://github.com/facebookresearch/belebele . ## Languages in Belebele FLORES-200 Code | English Name | Script | Family ---|---|---|--- acm_Arab | Mesopotamian Arabic | Arab | Afro-Asiatic afr_Latn | Afrikaans | Latn | Germanic als_Latn | Tosk Albanian | Latn | Paleo-Balkanic amh_Ethi | Amharic | Ethi | Afro-Asiatic apc_Arab | North Levantine Arabic | Arab | Afro-Asiatic arb_Arab | Modern Standard Arabic | Arab | Afro-Asiatic arb_Latn | Modern Standard Arabic (Romanized) | Latn | Afro-Asiatic ars_Arab | Najdi Arabic | Arab | Afro-Asiatic ary_arab | Moroccan Arabic | Arab | Afro-Asiatic arz_Arab | Egyptian Arabic | Arab | Afro-Asiatic asm_Beng | Assamese | Beng | Indo-Aryan azj_Latn | North Azerbaijani | Latn | Turkic bam_Latn | Bambara | Latn | Mande ben_Beng | Bengali | Beng | Indo-Aryan ben_Latn | Bengali (Romanized) | Latn | Indo-Aryan bod_Tibt | Standard Tibetan | Tibt | Sino-Tibetan bul_Cyrl | Bulgarian | Cyrl | Balto-Slavic cat_Latn | Catalan | Latn | Romance ceb_Latn | Cebuano | Latn | Austronesian ces_Latn | Czech | Latn | Balto-Slavic ckb_Arab | Central Kurdish | Arab | Iranian dan_Latn | Danish | Latn | Germanic deu_Latn | German | Latn | Germanic ell_Grek | Greek | Grek | Hellenic eng_Latn | English | Latn | Germanic est_Latn | Estonian | Latn | Uralic eus_Latn | Basque | Latn | Basque fin_Latn | Finnish | Latn | Uralic fra_Latn | French | Latn | Romance fuv_Latn | Nigerian Fulfulde | Latn | Atlantic-Congo gaz_Latn | West Central Oromo | Latn | Afro-Asiatic grn_Latn | Guarani | Latn | Tupian guj_Gujr | Gujarati | Gujr | Indo-Aryan hat_Latn | Haitian Creole | Latn | Atlantic-Congo hau_Latn | Hausa | Latn | Afro-Asiatic heb_Hebr | Hebrew | Hebr | Afro-Asiatic hin_Deva | Hindi | Deva | Indo-Aryan hin_Latn | Hindi (Romanized) | Latn | Indo-Aryan hrv_Latn | Croatian | Latn | Balto-Slavic hun_Latn | Hungarian | Latn | Uralic hye_Armn | Armenian | Armn | Armenian ibo_Latn | Igbo | Latn | Atlantic-Congo ilo_Latn | Ilocano | Latn | Austronesian ind_Latn | Indonesian | Latn | Austronesian isl_Latn | Icelandic | Latn | Germanic ita_Latn | Italian | Latn | Romance jav_Latn | Javanese | Latn | Austronesian jpn_Jpan | Japanese | Jpan | Japonic kac_Latn | Jingpho | Latn | Sino-Tibetan kan_Knda | Kannada | Knda | Dravidian kat_Geor | Georgian | Geor | kartvelian kaz_Cyrl | Kazakh | Cyrl | Turkic kea_Latn | Kabuverdianu | Latn | Portuguese Creole khk_Cyrl | Halh Mongolian | Cyrl | Mongolic khm_Khmr | Khmer | Khmr | Austroasiatic kin_Latn | Kinyarwanda | Latn | Atlantic-Congo kir_Cyrl | Kyrgyz | Cyrl | Turkic kor_Hang | Korean | Hang | Koreanic lao_Laoo | Lao | Laoo | Kra-Dai lin_Latn | Lingala | Latn | Atlantic-Congo lit_Latn | Lithuanian | Latn | Balto-Slavic lug_Latn | Ganda | Latn | Atlantic-Congo luo_Latn | Luo | Latn | Nilo-Saharan lvs_Latn | Standard Latvian | Latn | Balto-Slavic mal_Mlym | Malayalam | Mlym | Dravidian mar_Deva | Marathi | Deva | Indo-Aryan mkd_Cyrl | Macedonian | Cyrl | Balto-Slavic mlt_Latn | Maltese | Latn | Afro-Asiatic mri_Latn | Maori | Latn | Austronesian mya_Mymr | Burmese | Mymr | Sino-Tibetan nld_Latn | Dutch | Latn | Germanic nob_Latn | Norwegian Bokmål | Latn | Germanic npi_Deva | Nepali | Deva | Indo-Aryan npi_Latn | Nepali (Romanized) | Latn | Indo-Aryan nso_Latn | Northern Sotho | Latn | Atlantic-Congo nya_Latn | Nyanja | Latn | Afro-Asiatic ory_Orya | Odia | Orya | Indo-Aryan pan_Guru | Eastern Panjabi | Guru | Indo-Aryan pbt_Arab | Southern Pashto | Arab | Indo-Aryan pes_Arab | Western Persian | Arab | Iranian plt_Latn | Plateau Malagasy | Latn | Austronesian pol_Latn | Polish | Latn | Balto-Slavic por_Latn | Portuguese | Latn | Romance ron_Latn | Romanian | Latn | Romance rus_Cyrl | Russian | Cyrl | Balto-Slavic shn_Mymr | Shan | Mymr | Kra-Dai sin_Latn | Sinhala (Romanized) | Latn | Indo-Aryan sin_Sinh | Sinhala | Sinh | Indo-Aryan slk_Latn | Slovak | Latn | Balto-Slavic slv_Latn | Slovenian | Latn | Balto-Slavic sna_Latn | Shona | Latn | Atlantic-Congo snd_Arab | Sindhi | Arab | Indo-Aryan som_Latn | Somali | Latn | Afro-Asiatic sot_Latn | Southern Sotho | Latn | Atlantic-Congo spa_Latn | Spanish | Latn | Romance srp_Cyrl | Serbian | Cyrl | Balto-Slavic ssw_Latn | Swati | Latn | Atlantic-Congo sun_Latn | Sundanese | Latn | Austronesian swe_Latn | Swedish | Latn | Germanic swh_Latn | Swahili | Latn | Atlantic-Congo tam_Taml | Tamil | Taml | Dravidian tel_Telu | Telugu | Telu | Dravidian tgk_Cyrl | Tajik | Cyrl | Iranian tgl_Latn | Tagalog | Latn | Austronesian tha_Thai | Thai | Thai | Kra-Dai tir_Ethi | Tigrinya | Ethi | Afro-Asiatic tsn_Latn | Tswana | Latn | Atlantic-Congo tso_Latn | Tsonga | Latn | Afro-Asiatic tur_Latn | Turkish | Latn | Turkic ukr_Cyrl | Ukrainian | Cyrl | Balto-Slavic urd_Arab | Urdu | Arab | Indo-Aryan urd_Latn | Urdu (Romanized) | Latn | Indo-Aryan uzn_Latn | Northern Uzbek | Latn | Turkic vie_Latn | Vietnamese | Latn | Austroasiatic war_Latn | Waray | Latn | Austronesian wol_Latn | Wolof | Latn | Atlantic-Congo xho_Latn | Xhosa | Latn | Atlantic-Congo yor_Latn | Yoruba | Latn | Atlantic-Congo zho_Hans | Chinese (Simplified) | Hans | Sino-Tibetan zho_Hant | Chinese (Traditional) | Hant | Sino-Tibetan zsm_Latn | Standard Malay | Latn | Austronesian zul_Latn | Zulu | Latn | Atlantic-Congo

# 面向大规模多语言自然语言理解评估的Belebele基准测试集(The Belebele Benchmark for Massively Multilingual NLU Evaluation) Belebele是一款覆盖122种语言变体的多项选择式机器阅读理解(Machine Reading Comprehension, MRC)数据集。该数据集可用于评估高、中、低资源语言下的单语言与多语言模型。每个问题包含四个候选答案,并与来自FLORES-200数据集的短篇文本相关联。其人工标注流程经过精心设计,旨在构建能够区分不同水平的通用语言理解能力的问题,并通过大量质量检查加以验证。尽管所有问题均直接对应配套文本,但仅英语版本的数据集就足以对当前最先进的语言模型构成挑战。由于该数据集具备完全平行的特性,可直接对比不同语言下的模型性能。Belebele为评估与分析语言模型及自然语言处理(Natural Language Processing, NLP)系统的多语言能力开辟了新路径。 如需了解更多细节,请参阅我们在第62届国际计算语言学协会年会(ACL 2024)上发表的论文《The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants》,或访问项目GitHub仓库:https://github.com/facebookresearch/belebele ## 引用格式 若您在研究中使用该数据集,请引用如下文献: bibtex @inproceedings{bandarkar-etal-2024-belebele, title = "The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants", author = "Bandarkar, Lucas and Liang, Davis and Muller, Benjamin and Artetxe, Mikel and Shukla, Satya Narayan and Husa, Donald and Goyal, Naman and Krishnan, Abhinandan and Zettlemoyer, Luke and Khabsa, Madian", booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = aug, year = "2024", address = "Bangkok, Thailand and virtual meeting", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.acl-long.44", pages = "749--775", } ## 数据集构成 - 每个语言变体包含900个问题 - 共包含488篇独立文本,每篇文本对应1至2个问题 - 每个问题配有4个候选答案,且仅有一个正确答案 - 覆盖122种语言/语言变体(包含英语) - 总问题数为900×122=109800个 ## 进一步统计信息 - 涵盖122种语言变体,但若不计书写体系则对应115种独立语言 - 覆盖27个语系 - 包含29种书写体系 - 单篇文本平均词数为79.1(标准差为26.2) - 单篇文本平均句数为4.1(标准差为1.4) - 单个问题平均词数为12.9(标准差为4.0) - 单个候选答案平均词数为4.2(标准差为2.9) ## 可行的模型评估设置 得益于数据集的平行特性与任务的简洁性,我们可通过多种设置对语言模型进行评估。所有评估设置下的核心评价指标均为简单准确率(正确预测数/总样本数)。 在Belebele的英语版本上评估模型可通过微调、少样本学习或零样本学习三种方式实现。针对其他目标语言,我们在下文列出了部分可行的评估设置(未穷尽所有可能)。所有支持非英语模型(单语言或跨语言)评估的设置均以`^`标注。 #### 无需微调 - **基于自然语言提示的零样本学习(英语提示)** - 针对经过对话微调的模型,我们在同一输入中提供英语任务提示与目标语言的测试样本。 - 在我们的实验中,我们要求模型输出字母`A`、`B`、`C`或`D`。我们会执行后处理步骤,接受如`(A)`这类格式的预测结果。对于未以四个可选答案开头的预测结果,我们有时会额外移除前缀"The correct answer is"。 - 示例提示可在数据集的GitHub仓库中找到:https://github.com/facebookresearch/belebele。 - **基于自然语言提示的零样本学习(翻译后的提示)** ^ - 与上述设置类似,仅将提示翻译为目标语言,使提示与测试样本使用同一语言。提示可通过人工或机器翻译得到。 - **基于英语示例的少样本上下文学习** - 从英语训练集中抽取少量样本(例如5个)作为提示输入模型,随后使用目标语言的文本、问题与候选答案对模型进行评估。 - 在我们的实验中,我们使用如下提示模板:P: <passage> Q: <question> A: <mc answer 1> B: <mc answer 2> C: <mc answer 3> D: <mc answer 4> Answer: <Correct answer letter>。我们通过选取相对其他选项概率最高的`[A, B, C, D]`选项作为预测结果。 - **基于翻译后示例的少样本上下文学习** ^ - 与上述设置类似,仅将训练集中的示例翻译为目标语言,使示例与评估数据使用同一语言。训练示例可通过人工或机器翻译得到。 #### 需微调 - **英语微调与多语言评估** - 使用英语训练集对模型进行微调(通常搭配序列分类头),随后在所有目标语言上分别对模型进行评估。本文中呈现的实验结果使用了HuggingFace库中的[XLMRobertaForMultipleChoice](https://huggingface.co/docs/transformers/en/model_doc/xlm-roberta#transformers.XLMRobertaForMultipleChoice)。 - **英语微调与跨语言评估** - 与上述设置类似,仅采用跨语言评估模式:对于每个问题,其文本、问题与候选答案可使用不同语言。例如,文本使用语言`x`,问题使用语言`y`,候选答案使用语言`z`。 - **翻译后训练** ^ - 针对每个目标语言,使用从英语机器翻译至该语言的训练样本对模型进行单独微调,随后在对应目标语言上评估模型。 - **全语言翻译后训练** - 与上述设置类似,但模型使用所有目标语言的机器翻译训练样本进行统一微调,最终使用单个微调完成的模型在所有目标语言上进行评估。 - **全语言翻译后训练与跨语言评估** - 与上述设置类似,仅采用跨语言评估模式:对于每个问题,其文本、问题与候选答案可使用不同语言。 - **翻译后测试** - 使用英语训练数据对模型进行微调,随后将测试数据集机器翻译为英语,并在英语数据集上进行评估。 - 该设置主要用于衡量机器翻译系统的质量,但可用于与多语言模型进行性能对比。 此外,FLORES-200中另有83种语言未在Belebele中完成问题翻译。由于这些目标语言已配有对应文本,通过机器翻译问题与候选答案,即可对这些语言下的机器阅读理解能力进行较为合理的评估。 ## 训练集 如论文中所述,我们在GitHub仓库中提供了一个整合后的训练集:https://github.com/facebookresearch/belebele。 Belebele数据集仅应用作测试集,不得用于训练或验证。因此,对于需要额外任务专属训练的模型,我们建议使用从现有英语多项选择问答数据集整合而来的训练集。我们筛选了多款数据集,最终确定兼容性最佳的为RACE、SciQ、MultiRC、MCTest、MCScript2.0与ReClor。 我们对上述六个数据集分别进行解压与格式重构,随后过滤掉不合适的样本(例如存在多个正确答案的问题)。最终,该训练集包含67.5k个训练样本与3.7k个开发样本,其中超过一半的样本来自RACE数据集。我们提供了`assemble_training_set.py`脚本,可供任意用户重建该数据集以用于任务微调。 由于该训练集整合自其他数据集,其使用协议与Belebele数据集不同。我们并未宣称对这些数据集或其相关工作拥有任何权利。详细信息请参阅https://github.com/facebookresearch/belebele README文件中的许可证章节。 ## Belebele覆盖的语言 FLORES-200代码 | 英语名称 | 书写体系 | 语系 ---|---|---|--- acm_Arab | Mesopotamian Arabic | Arab | Afro-Asiatic afr_Latn | Afrikaans | Latn | Germanic als_Latn | Tosk Albanian | Latn | Paleo-Balkanic amh_Ethi | Amharic | Ethi | Afro-Asiatic apc_Arab | North Levantine Arabic | Arab | Afro-Asiatic arb_Arab | Modern Standard Arabic | Arab | Afro-Asiatic arb_Latn | Modern Standard Arabic (Romanized) | Latn | Afro-Asiatic ars_Arab | Najdi Arabic | Arab | Afro-Asiatic ary_arab | Moroccan Arabic | Arab | Afro-Asiatic arz_Arab | Egyptian Arabic | Arab | Afro-Asiatic asm_Beng | Assamese | Beng | Indo-Aryan azj_Latn | North Azerbaijani | Latn | Turkic bam_Latn | Bambara | Latn | Mande ben_Beng | Bengali | Beng | Indo-Aryan ben_Latn | Bengali (Romanized) | Latn | Indo-Aryan bod_Tibt | Standard Tibetan | Tibt | Sino-Tibetan bul_Cyrl | Bulgarian | Cyrl | Balto-Slavic cat_Latn | Catalan | Latn | Romance ceb_Latn | Cebuano | Latn | Austronesian ces_Latn | Czech | Latn | Balto-Slavic ckb_Arab | Central Kurdish | Arab | Iranian dan_Latn | Danish | Latn | Germanic deu_Latn | German | Latn | Germanic ell_Grek | Greek | Grek | Hellenic eng_Latn | English | Latn | Germanic est_Latn | Estonian | Latn | Uralic eus_Latn | Basque | Latn | Basque fin_Latn | Finnish | Latn | Uralic fra_Latn | French | Latn | Romance fuv_Latn | Nigerian Fulfulde | Latn | Atlantic-Congo gaz_Latn | West Central Oromo | Latn | Afro-Asiatic grn_Latn | Guarani | Latn | Tupian guj_Gujr | Gujarati | Gujr | Indo-Aryan hat_Latn | Haitian Creole | Latn | Atlantic-Congo hau_Latn | Hausa | Latn | Afro-Asiatic heb_Hebr | Hebrew | Hebr | Afro-Asiatic hin_Deva | Hindi | Deva | Indo-Aryan hin_Latn | Hindi (Romanized) | Latn | Indo-Aryan hrv_Latn | Croatian | Latn | Balto-Slavic hun_Latn | Hungarian | Latn | Uralic hye_Armn | Armenian | Armn | Armenian ibo_Latn | Igbo | Latn | Atlantic-Congo ilo_Latn | Ilocano | Latn | Austronesian ind_Latn | Indonesian | Latn | Austronesian isl_Latn | Icelandic | Latn | Germanic ita_Latn | Italian | Latn | Romance jav_Latn | Javanese | Latn | Austronesian jpn_Jpan | Japanese | Jpan | Japonic kac_Latn | Jingpho | Latn | Sino-Tibetan kan_Knda | Kannada | Knda | Dravidian kat_Geor | Georgian | Geor | kartvelian kaz_Cyrl | Kazakh | Cyrl | Turkic kea_Latn | Kabuverdianu | Latn | Portuguese Creole khk_Cyrl | Halh Mongolian | Cyrl | Mongolic khm_Khmr | Khmer | Khmr | Austroasiatic kin_Latn | Kinyarwanda | Latn | Atlantic-Congo kir_Cyrl | Kyrgyz | Cyrl | Turkic kor_Hang | Korean | Hang | Koreanic lao_Laoo | Lao | Laoo | Kra-Dai lin_Latn | Lingala | Latn | Atlantic-Congo lit_Latn | Lithuanian | Latn | Balto-Slavic lug_Latn | Ganda | Latn | Atlantic-Congo luo_Latn | Luo | Latn | Nilo-Saharan lvs_Latn | Standard Latvian | Latn | Balto-Slavic mal_Mlym | Malayalam | Mlym | Dravidian mar_Deva | Marathi | Deva | Indo-Aryan mkd_Cyrl | Macedonian | Cyrl | Balto-Slavic mlt_Latn | Maltese | Latn | Afro-Asiatic mri_Latn | Maori | Latn | Austronesian mya_Mymr | Burmese | Mymr | Sino-Tibetan nld_Latn | Dutch | Latn | Germanic nob_Latn | Norwegian Bokmål | Latn | Germanic npi_Deva | Nepali | Deva | Indo-Aryan npi_Latn | Nepali (Romanized) | Latn | Indo-Aryan nso_Latn | Northern Sotho | Latn | Atlantic-Congo nya_Latn | Nyanja | Latn | Afro-Asiatic ory_Orya | Odia | Orya | Indo-Aryan pan_Guru | Eastern Panjabi | Guru | Indo-Aryan pbt_Arab | Southern Pashto | Arab | Indo-Aryan pes_Arab | Western Persian | Arab | Iranian plt_Latn | Plateau Malagasy | Latn | Austronesian pol_Latn | Polish | Latn | Balto-Slavic por_Latn | Portuguese | Latn | Romance ron_Latn | Romanian | Latn | Romance rus_Cyrl | Russian | Cyrl | Balto-Slavic shn_Mymr | Shan | Mymr | Kra-Dai sin_Latn | Sinhala (Romanized) | Latn | Indo-Aryan sin_Sinh | Sinhala | Sinh | Indo-Aryan slk_Latn | Slovak | Latn | Balto-Slavic slv_Latn | Slovenian | Latn | Balto-Slavic sna_Latn | Shona | Latn | Atlantic-Congo snd_Arab | Sindhi | Arab | Indo-Aryan som_Latn | Somali | Latn | Afro-Asiatic sot_Latn | Southern Sotho | Latn | Atlantic-Congo spa_Latn | Spanish | Latn | Romance srp_Cyrl | Serbian | Cyrl | Balto-Slavic ssw_Latn | Swati | Latn | Atlantic-Congo sun_Latn | Sundanese | Latn | Austronesian swe_Latn | Swedish | Latn | Germanic swh_Latn | Swahili | Latn | Atlantic-Congo tam_Taml | Tamil | Taml | Dravidian tel_Telu | Telugu | Telu | Dravidian tgk_Cyrl | Tajik | Cyrl | Iranian tgl_Latn | Tagalog | Latn | Austronesian tha_Thai | Thai | Thai | Kra-Dai tir_Ethi | Tigrinya | Ethi | Afro-Asiatic tsn_Latn | Tswana | Latn | Atlantic-Congo tso_Latn | Tsonga | Latn | Afro-Asiatic tur_Latn | Turkish | Latn | Turkic ukr_Cyrl | Ukrainian | Cyrl | Balto-Slavic urd_Arab | Urdu | Arab | Indo-Aryan urd_Latn | Urdu (Romanized) | Latn | Indo-Aryan uzn_Latn | Northern Uzbek | Latn | Turkic vie_Latn | Vietnamese | Latn | Austroasiatic war_Latn | Waray | Latn | Austronesian wol_Latn | Wolof | Latn | Atlantic-Congo xho_Latn | Xhosa | Latn | Atlantic-Congo yor_Latn | Yoruba | Latn | Atlantic-Congo zho_Hans | Chinese (Simplified) | Hans | Sino-Tibetan zho_Hant | Chinese (Traditional) | Hant | Sino-Tibetan zsm_Latn | Standard Malay | Latn | Austronesian zul_Latn | Zulu | Latn | Atlantic-Congo
提供机构:
maas
创建时间:
2025-05-20
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
Belebele是一个多语言机器阅读理解数据集,涵盖122种语言变体,每个问题有四个多项选择答案,并与FLORES-200数据集中的短文相关联。数据集设计用于评估单语和多语模型在高、中、低资源语言中的表现,且完全并行,可直接比较不同语言的模型性能。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作