belebele
收藏魔搭社区2026-01-06 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/facebook/belebele
下载链接
链接失效反馈官方服务:
资源简介:
# The Belebele Benchmark for Massively Multilingual NLU Evaluation
Belebele is a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. This dataset enables the evaluation of mono- and multi-lingual models in high-, medium-, and low-resource languages. Each question has four multiple-choice answers and is linked to a short passage from the [FLORES-200](https://github.com/facebookresearch/flores/tree/main/flores200) dataset. The human annotation procedure was carefully curated to create questions that discriminate between different levels of generalizable language comprehension and is reinforced by extensive quality checks. While all questions directly relate to the passage, the English dataset on its own proves difficult enough to challenge state-of-the-art language models. Being fully parallel, this dataset enables direct comparison of model performance across all languages. Belebele opens up new avenues for evaluating and analyzing the multilingual abilities of language models and NLP systems.
Please refer to our paper for more details, presented at ACL 2024: [The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants](https://ai.meta.com/research/publications/the-belebele-benchmark-a-parallel-reading-comprehension-dataset-in-122-language-variants/).
Or get more details at https://github.com/facebookresearch/belebele
## Citation
If you use this data in your work, please cite:
```bibtex
@inproceedings{bandarkar-etal-2024-belebele,
title = "The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants",
author = "Bandarkar, Lucas and
Liang, Davis and
Muller, Benjamin and
Artetxe, Mikel and
Shukla, Satya Narayan and
Husa, Donald and
Goyal, Naman and
Krishnan, Abhinandan and
Zettlemoyer, Luke and
Khabsa, Madian",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = aug,
year = "2024",
address = "Bangkok, Thailand and virtual meeting",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-long.44",
pages = "749--775",
}
```
## Composition
- 900 questions per language variant
- 488 distinct passages, there are 1-2 associated questions for each.
- For each question, there is 4 multiple-choice answers, exactly 1 of which is correct.
- 122 language/language variants (including English).
- 900 x 122 = 109,800 total questions.
## Further Stats
- 122 language variants, but 115 distinct languages (ignoring scripts)
- 27 language families
- 29 scripts
- Avg. words per passage = 79.1 (std = 26.2)
- Avg. sentences per passage = 4.1 (std = 1.4)
- Avg. words per question = 12.9(std = 4.0)
- Avg. words per answer = 4.2 (std = 2.9)
## Pausible Evaluation Settings
Thanks to the parallel nature of the dataset and the simplicity of the task, there are many possible settings in which we can evaluate language models. In all evaluation settings, the metric of interest is simple accuracy (# correct / total).
Evaluating models on Belebele in English can be done via finetuning, few-shot, or zero-shot. For other target languages, we propose the incomprehensive list of evaluation settings below. Settings that are compatible with evaluating non-English models (monolingual or cross-lingual) are denoted with `^`.
#### No finetuning
- **Zero-shot with natural language instructions (English instructions)**
- For chat-finetuned models, we give it English instructions for the task and the sample in the target language in the same input.
- For our experiments, we instruct the model to provide the letter `A`, `B`, `C`, or `D`. We perform post-processing steps and accept answers predicted as e.g. `(A)` instead of `A`. We sometimes additionally remove the prefix `The correct answer is` for predictions that do not start with one of the four accepted answers.
- Sample instructions can be found at the [dataset github repo](https://github.com/facebookresearch/belebele).
- **Zero-shot with natural language instructions (translated instructions)** ^
- Same as above, except the instructions are translated to the target language so that the instructions and samples are in the same language. The instructions can be human or machine-translated.
- **Few-shot in-context learning (English examples)**
- A few samples (e.g. 5) are taken from the English training set (see below) and prompted to the model. Then, the model is evaluated with the same template but with the passages, questions, and answers in the target language.
- For our experiments, we use the template: ```P: <passage> \n Q: <question> \n A: <mc answer 1> \n B: <mc answer 2> \n C: <mc answer 3> \n D: <mc answer 4> \n Answer: <Correct answer letter>```. We perform prediction by picking the answer within `[A, B, C, D]` that has the highest probability relatively to the others.
- **Few-shot in-context learning (translated examples)** ^
- Same as above, except the samples from the training set are translated to the target language so that the examples and evaluation data are in the same language. The training samples can be human or machine-translated.
#### With finetuning
- **English finetune & multilingual evaluation**
- The model is finetuned to the task using the English training set, probably with a sequence classification head. Then the model is evaluated in all the target languages individually. For results presented in the paper we used [the HuggingFace library](https://huggingface.co/docs/transformers/en/model_doc/xlm-roberta#transformers.XLMRobertaForMultipleChoice).
- **English finetune & cross-lingual evaluation**
- Same as above, except the model is evaluated in a cross-lingual setting, where for each question, the passage & answers could be provided in a different language. For example, passage could be in language `x`, question in language `y`, and answers in language `z`.
- **Translate-train** ^
- For each target language, the model is individually finetuned on training samples that have been machine-translated from English to that language. Each model is then evaluated in the respective target language.
- **Translate-train-all**
- Similar to above, except here the model is trained on translated samples from all target languages at once. The single finetuned model is then evaluated on all target languages.
- **Translate-train-all & cross-lingual evaluation**
- Same as above, except the single finetuned model is evaluated in a cross-lingual setting, where for each question, the passage & answers could be provided in a different language.
- **Translate-test**
- The model is finetuned using the English training data and then the evaluation dataset is machine-translated to English and evaluated on the English.
- This setting is primarily a reflection of the quality of the machine translation system, but is useful for comparison to multilingual models.
In addition, there are 83 additional languages in FLORES-200 for which questions were not translated for Belebele. Since the passages exist in those target languages, machine-translating the questions & answers may enable decent evaluation of machine reading comprehension in those languages.
## Training Set
As discussed in the paper, we also provide an assembled training set consisting of samples at the [github repo](https://github.com/facebookresearch/belebele).
The Belebele dataset is intended to be used only as a test set, and not for training or validation. Therefore, for models that require additional task-specific training, we instead propose using an assembled training set consisting of samples from pre-existing multiple-choice QA datasets in English. We considered diverse datasets, and determine the most compatible to be [RACE](https://www.cs.cmu.edu/~glai1/data/race/), [SciQ](https://allenai.org/data/sciq), [MultiRC](https://cogcomp.seas.upenn.edu/multirc/), [MCTest](https://mattr1.github.io/mctest/), [MCScript2.0](https://aclanthology.org/S19-1012/), and [ReClor](https://whyu.me/reclor/).
For each of the six datasets, we unpack and restructure the passages and questions from their respective formats. We then filter out less suitable samples (e.g. questions with multiple correct answers). In the end, the dataset comprises 67.5k training samples and 3.7k development samples, more than half of which are from RACE. We provide a script (`assemble_training_set.py`) to reconstruct this dataset for anyone to perform task finetuning.
Since the training set is a joint sample of other datasets, it is governed by a different license. We do not claim any of that work or datasets to be our own. See the Licenses section in the README of https://github.com/facebookresearch/belebele .
## Languages in Belebele
FLORES-200 Code | English Name | Script | Family
---|---|---|---
acm_Arab | Mesopotamian Arabic | Arab | Afro-Asiatic
afr_Latn | Afrikaans | Latn | Germanic
als_Latn | Tosk Albanian | Latn | Paleo-Balkanic
amh_Ethi | Amharic | Ethi | Afro-Asiatic
apc_Arab | North Levantine Arabic | Arab | Afro-Asiatic
arb_Arab | Modern Standard Arabic | Arab | Afro-Asiatic
arb_Latn | Modern Standard Arabic (Romanized) | Latn | Afro-Asiatic
ars_Arab | Najdi Arabic | Arab | Afro-Asiatic
ary_arab | Moroccan Arabic | Arab | Afro-Asiatic
arz_Arab | Egyptian Arabic | Arab | Afro-Asiatic
asm_Beng | Assamese | Beng | Indo-Aryan
azj_Latn | North Azerbaijani | Latn | Turkic
bam_Latn | Bambara | Latn | Mande
ben_Beng | Bengali | Beng | Indo-Aryan
ben_Latn | Bengali (Romanized) | Latn | Indo-Aryan
bod_Tibt | Standard Tibetan | Tibt | Sino-Tibetan
bul_Cyrl | Bulgarian | Cyrl | Balto-Slavic
cat_Latn | Catalan | Latn | Romance
ceb_Latn | Cebuano | Latn | Austronesian
ces_Latn | Czech | Latn | Balto-Slavic
ckb_Arab | Central Kurdish | Arab | Iranian
dan_Latn | Danish | Latn | Germanic
deu_Latn | German | Latn | Germanic
ell_Grek | Greek | Grek | Hellenic
eng_Latn | English | Latn | Germanic
est_Latn | Estonian | Latn | Uralic
eus_Latn | Basque | Latn | Basque
fin_Latn | Finnish | Latn | Uralic
fra_Latn | French | Latn | Romance
fuv_Latn | Nigerian Fulfulde | Latn | Atlantic-Congo
gaz_Latn | West Central Oromo | Latn | Afro-Asiatic
grn_Latn | Guarani | Latn | Tupian
guj_Gujr | Gujarati | Gujr | Indo-Aryan
hat_Latn | Haitian Creole | Latn | Atlantic-Congo
hau_Latn | Hausa | Latn | Afro-Asiatic
heb_Hebr | Hebrew | Hebr | Afro-Asiatic
hin_Deva | Hindi | Deva | Indo-Aryan
hin_Latn | Hindi (Romanized) | Latn | Indo-Aryan
hrv_Latn | Croatian | Latn | Balto-Slavic
hun_Latn | Hungarian | Latn | Uralic
hye_Armn | Armenian | Armn | Armenian
ibo_Latn | Igbo | Latn | Atlantic-Congo
ilo_Latn | Ilocano | Latn | Austronesian
ind_Latn | Indonesian | Latn | Austronesian
isl_Latn | Icelandic | Latn | Germanic
ita_Latn | Italian | Latn | Romance
jav_Latn | Javanese | Latn | Austronesian
jpn_Jpan | Japanese | Jpan | Japonic
kac_Latn | Jingpho | Latn | Sino-Tibetan
kan_Knda | Kannada | Knda | Dravidian
kat_Geor | Georgian | Geor | kartvelian
kaz_Cyrl | Kazakh | Cyrl | Turkic
kea_Latn | Kabuverdianu | Latn | Portuguese Creole
khk_Cyrl | Halh Mongolian | Cyrl | Mongolic
khm_Khmr | Khmer | Khmr | Austroasiatic
kin_Latn | Kinyarwanda | Latn | Atlantic-Congo
kir_Cyrl | Kyrgyz | Cyrl | Turkic
kor_Hang | Korean | Hang | Koreanic
lao_Laoo | Lao | Laoo | Kra-Dai
lin_Latn | Lingala | Latn | Atlantic-Congo
lit_Latn | Lithuanian | Latn | Balto-Slavic
lug_Latn | Ganda | Latn | Atlantic-Congo
luo_Latn | Luo | Latn | Nilo-Saharan
lvs_Latn | Standard Latvian | Latn | Balto-Slavic
mal_Mlym | Malayalam | Mlym | Dravidian
mar_Deva | Marathi | Deva | Indo-Aryan
mkd_Cyrl | Macedonian | Cyrl | Balto-Slavic
mlt_Latn | Maltese | Latn | Afro-Asiatic
mri_Latn | Maori | Latn | Austronesian
mya_Mymr | Burmese | Mymr | Sino-Tibetan
nld_Latn | Dutch | Latn | Germanic
nob_Latn | Norwegian Bokmål | Latn | Germanic
npi_Deva | Nepali | Deva | Indo-Aryan
npi_Latn | Nepali (Romanized) | Latn | Indo-Aryan
nso_Latn | Northern Sotho | Latn | Atlantic-Congo
nya_Latn | Nyanja | Latn | Afro-Asiatic
ory_Orya | Odia | Orya | Indo-Aryan
pan_Guru | Eastern Panjabi | Guru | Indo-Aryan
pbt_Arab | Southern Pashto | Arab | Indo-Aryan
pes_Arab | Western Persian | Arab | Iranian
plt_Latn | Plateau Malagasy | Latn | Austronesian
pol_Latn | Polish | Latn | Balto-Slavic
por_Latn | Portuguese | Latn | Romance
ron_Latn | Romanian | Latn | Romance
rus_Cyrl | Russian | Cyrl | Balto-Slavic
shn_Mymr | Shan | Mymr | Kra-Dai
sin_Latn | Sinhala (Romanized) | Latn | Indo-Aryan
sin_Sinh | Sinhala | Sinh | Indo-Aryan
slk_Latn | Slovak | Latn | Balto-Slavic
slv_Latn | Slovenian | Latn | Balto-Slavic
sna_Latn | Shona | Latn | Atlantic-Congo
snd_Arab | Sindhi | Arab | Indo-Aryan
som_Latn | Somali | Latn | Afro-Asiatic
sot_Latn | Southern Sotho | Latn | Atlantic-Congo
spa_Latn | Spanish | Latn | Romance
srp_Cyrl | Serbian | Cyrl | Balto-Slavic
ssw_Latn | Swati | Latn | Atlantic-Congo
sun_Latn | Sundanese | Latn | Austronesian
swe_Latn | Swedish | Latn | Germanic
swh_Latn | Swahili | Latn | Atlantic-Congo
tam_Taml | Tamil | Taml | Dravidian
tel_Telu | Telugu | Telu | Dravidian
tgk_Cyrl | Tajik | Cyrl | Iranian
tgl_Latn | Tagalog | Latn | Austronesian
tha_Thai | Thai | Thai | Kra-Dai
tir_Ethi | Tigrinya | Ethi | Afro-Asiatic
tsn_Latn | Tswana | Latn | Atlantic-Congo
tso_Latn | Tsonga | Latn | Afro-Asiatic
tur_Latn | Turkish | Latn | Turkic
ukr_Cyrl | Ukrainian | Cyrl | Balto-Slavic
urd_Arab | Urdu | Arab | Indo-Aryan
urd_Latn | Urdu (Romanized) | Latn | Indo-Aryan
uzn_Latn | Northern Uzbek | Latn | Turkic
vie_Latn | Vietnamese | Latn | Austroasiatic
war_Latn | Waray | Latn | Austronesian
wol_Latn | Wolof | Latn | Atlantic-Congo
xho_Latn | Xhosa | Latn | Atlantic-Congo
yor_Latn | Yoruba | Latn | Atlantic-Congo
zho_Hans | Chinese (Simplified) | Hans | Sino-Tibetan
zho_Hant | Chinese (Traditional) | Hant | Sino-Tibetan
zsm_Latn | Standard Malay | Latn | Austronesian
zul_Latn | Zulu | Latn | Atlantic-Congo
# 面向大规模多语言自然语言理解评估的Belebele基准测试集(The Belebele Benchmark for Massively Multilingual NLU Evaluation)
Belebele是一款覆盖122种语言变体的多项选择式机器阅读理解(Machine Reading Comprehension, MRC)数据集。该数据集可用于评估高、中、低资源语言下的单语言与多语言模型。每个问题包含四个候选答案,并与来自FLORES-200数据集的短篇文本相关联。其人工标注流程经过精心设计,旨在构建能够区分不同水平的通用语言理解能力的问题,并通过大量质量检查加以验证。尽管所有问题均直接对应配套文本,但仅英语版本的数据集就足以对当前最先进的语言模型构成挑战。由于该数据集具备完全平行的特性,可直接对比不同语言下的模型性能。Belebele为评估与分析语言模型及自然语言处理(Natural Language Processing, NLP)系统的多语言能力开辟了新路径。
如需了解更多细节,请参阅我们在第62届国际计算语言学协会年会(ACL 2024)上发表的论文《The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants》,或访问项目GitHub仓库:https://github.com/facebookresearch/belebele
## 引用格式
若您在研究中使用该数据集,请引用如下文献:
bibtex
@inproceedings{bandarkar-etal-2024-belebele,
title = "The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants",
author = "Bandarkar, Lucas and
Liang, Davis and
Muller, Benjamin and
Artetxe, Mikel and
Shukla, Satya Narayan and
Husa, Donald and
Goyal, Naman and
Krishnan, Abhinandan and
Zettlemoyer, Luke and
Khabsa, Madian",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = aug,
year = "2024",
address = "Bangkok, Thailand and virtual meeting",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-long.44",
pages = "749--775",
}
## 数据集构成
- 每个语言变体包含900个问题
- 共包含488篇独立文本,每篇文本对应1至2个问题
- 每个问题配有4个候选答案,且仅有一个正确答案
- 覆盖122种语言/语言变体(包含英语)
- 总问题数为900×122=109800个
## 进一步统计信息
- 涵盖122种语言变体,但若不计书写体系则对应115种独立语言
- 覆盖27个语系
- 包含29种书写体系
- 单篇文本平均词数为79.1(标准差为26.2)
- 单篇文本平均句数为4.1(标准差为1.4)
- 单个问题平均词数为12.9(标准差为4.0)
- 单个候选答案平均词数为4.2(标准差为2.9)
## 可行的模型评估设置
得益于数据集的平行特性与任务的简洁性,我们可通过多种设置对语言模型进行评估。所有评估设置下的核心评价指标均为简单准确率(正确预测数/总样本数)。
在Belebele的英语版本上评估模型可通过微调、少样本学习或零样本学习三种方式实现。针对其他目标语言,我们在下文列出了部分可行的评估设置(未穷尽所有可能)。所有支持非英语模型(单语言或跨语言)评估的设置均以`^`标注。
#### 无需微调
- **基于自然语言提示的零样本学习(英语提示)**
- 针对经过对话微调的模型,我们在同一输入中提供英语任务提示与目标语言的测试样本。
- 在我们的实验中,我们要求模型输出字母`A`、`B`、`C`或`D`。我们会执行后处理步骤,接受如`(A)`这类格式的预测结果。对于未以四个可选答案开头的预测结果,我们有时会额外移除前缀"The correct answer is"。
- 示例提示可在数据集的GitHub仓库中找到:https://github.com/facebookresearch/belebele。
- **基于自然语言提示的零样本学习(翻译后的提示)** ^
- 与上述设置类似,仅将提示翻译为目标语言,使提示与测试样本使用同一语言。提示可通过人工或机器翻译得到。
- **基于英语示例的少样本上下文学习**
- 从英语训练集中抽取少量样本(例如5个)作为提示输入模型,随后使用目标语言的文本、问题与候选答案对模型进行评估。
- 在我们的实验中,我们使用如下提示模板:P: <passage>
Q: <question>
A: <mc answer 1>
B: <mc answer 2>
C: <mc answer 3>
D: <mc answer 4>
Answer: <Correct answer letter>。我们通过选取相对其他选项概率最高的`[A, B, C, D]`选项作为预测结果。
- **基于翻译后示例的少样本上下文学习** ^
- 与上述设置类似,仅将训练集中的示例翻译为目标语言,使示例与评估数据使用同一语言。训练示例可通过人工或机器翻译得到。
#### 需微调
- **英语微调与多语言评估**
- 使用英语训练集对模型进行微调(通常搭配序列分类头),随后在所有目标语言上分别对模型进行评估。本文中呈现的实验结果使用了HuggingFace库中的[XLMRobertaForMultipleChoice](https://huggingface.co/docs/transformers/en/model_doc/xlm-roberta#transformers.XLMRobertaForMultipleChoice)。
- **英语微调与跨语言评估**
- 与上述设置类似,仅采用跨语言评估模式:对于每个问题,其文本、问题与候选答案可使用不同语言。例如,文本使用语言`x`,问题使用语言`y`,候选答案使用语言`z`。
- **翻译后训练** ^
- 针对每个目标语言,使用从英语机器翻译至该语言的训练样本对模型进行单独微调,随后在对应目标语言上评估模型。
- **全语言翻译后训练**
- 与上述设置类似,但模型使用所有目标语言的机器翻译训练样本进行统一微调,最终使用单个微调完成的模型在所有目标语言上进行评估。
- **全语言翻译后训练与跨语言评估**
- 与上述设置类似,仅采用跨语言评估模式:对于每个问题,其文本、问题与候选答案可使用不同语言。
- **翻译后测试**
- 使用英语训练数据对模型进行微调,随后将测试数据集机器翻译为英语,并在英语数据集上进行评估。
- 该设置主要用于衡量机器翻译系统的质量,但可用于与多语言模型进行性能对比。
此外,FLORES-200中另有83种语言未在Belebele中完成问题翻译。由于这些目标语言已配有对应文本,通过机器翻译问题与候选答案,即可对这些语言下的机器阅读理解能力进行较为合理的评估。
## 训练集
如论文中所述,我们在GitHub仓库中提供了一个整合后的训练集:https://github.com/facebookresearch/belebele。
Belebele数据集仅应用作测试集,不得用于训练或验证。因此,对于需要额外任务专属训练的模型,我们建议使用从现有英语多项选择问答数据集整合而来的训练集。我们筛选了多款数据集,最终确定兼容性最佳的为RACE、SciQ、MultiRC、MCTest、MCScript2.0与ReClor。
我们对上述六个数据集分别进行解压与格式重构,随后过滤掉不合适的样本(例如存在多个正确答案的问题)。最终,该训练集包含67.5k个训练样本与3.7k个开发样本,其中超过一半的样本来自RACE数据集。我们提供了`assemble_training_set.py`脚本,可供任意用户重建该数据集以用于任务微调。
由于该训练集整合自其他数据集,其使用协议与Belebele数据集不同。我们并未宣称对这些数据集或其相关工作拥有任何权利。详细信息请参阅https://github.com/facebookresearch/belebele README文件中的许可证章节。
## Belebele覆盖的语言
FLORES-200代码 | 英语名称 | 书写体系 | 语系
---|---|---|---
acm_Arab | Mesopotamian Arabic | Arab | Afro-Asiatic
afr_Latn | Afrikaans | Latn | Germanic
als_Latn | Tosk Albanian | Latn | Paleo-Balkanic
amh_Ethi | Amharic | Ethi | Afro-Asiatic
apc_Arab | North Levantine Arabic | Arab | Afro-Asiatic
arb_Arab | Modern Standard Arabic | Arab | Afro-Asiatic
arb_Latn | Modern Standard Arabic (Romanized) | Latn | Afro-Asiatic
ars_Arab | Najdi Arabic | Arab | Afro-Asiatic
ary_arab | Moroccan Arabic | Arab | Afro-Asiatic
arz_Arab | Egyptian Arabic | Arab | Afro-Asiatic
asm_Beng | Assamese | Beng | Indo-Aryan
azj_Latn | North Azerbaijani | Latn | Turkic
bam_Latn | Bambara | Latn | Mande
ben_Beng | Bengali | Beng | Indo-Aryan
ben_Latn | Bengali (Romanized) | Latn | Indo-Aryan
bod_Tibt | Standard Tibetan | Tibt | Sino-Tibetan
bul_Cyrl | Bulgarian | Cyrl | Balto-Slavic
cat_Latn | Catalan | Latn | Romance
ceb_Latn | Cebuano | Latn | Austronesian
ces_Latn | Czech | Latn | Balto-Slavic
ckb_Arab | Central Kurdish | Arab | Iranian
dan_Latn | Danish | Latn | Germanic
deu_Latn | German | Latn | Germanic
ell_Grek | Greek | Grek | Hellenic
eng_Latn | English | Latn | Germanic
est_Latn | Estonian | Latn | Uralic
eus_Latn | Basque | Latn | Basque
fin_Latn | Finnish | Latn | Uralic
fra_Latn | French | Latn | Romance
fuv_Latn | Nigerian Fulfulde | Latn | Atlantic-Congo
gaz_Latn | West Central Oromo | Latn | Afro-Asiatic
grn_Latn | Guarani | Latn | Tupian
guj_Gujr | Gujarati | Gujr | Indo-Aryan
hat_Latn | Haitian Creole | Latn | Atlantic-Congo
hau_Latn | Hausa | Latn | Afro-Asiatic
heb_Hebr | Hebrew | Hebr | Afro-Asiatic
hin_Deva | Hindi | Deva | Indo-Aryan
hin_Latn | Hindi (Romanized) | Latn | Indo-Aryan
hrv_Latn | Croatian | Latn | Balto-Slavic
hun_Latn | Hungarian | Latn | Uralic
hye_Armn | Armenian | Armn | Armenian
ibo_Latn | Igbo | Latn | Atlantic-Congo
ilo_Latn | Ilocano | Latn | Austronesian
ind_Latn | Indonesian | Latn | Austronesian
isl_Latn | Icelandic | Latn | Germanic
ita_Latn | Italian | Latn | Romance
jav_Latn | Javanese | Latn | Austronesian
jpn_Jpan | Japanese | Jpan | Japonic
kac_Latn | Jingpho | Latn | Sino-Tibetan
kan_Knda | Kannada | Knda | Dravidian
kat_Geor | Georgian | Geor | kartvelian
kaz_Cyrl | Kazakh | Cyrl | Turkic
kea_Latn | Kabuverdianu | Latn | Portuguese Creole
khk_Cyrl | Halh Mongolian | Cyrl | Mongolic
khm_Khmr | Khmer | Khmr | Austroasiatic
kin_Latn | Kinyarwanda | Latn | Atlantic-Congo
kir_Cyrl | Kyrgyz | Cyrl | Turkic
kor_Hang | Korean | Hang | Koreanic
lao_Laoo | Lao | Laoo | Kra-Dai
lin_Latn | Lingala | Latn | Atlantic-Congo
lit_Latn | Lithuanian | Latn | Balto-Slavic
lug_Latn | Ganda | Latn | Atlantic-Congo
luo_Latn | Luo | Latn | Nilo-Saharan
lvs_Latn | Standard Latvian | Latn | Balto-Slavic
mal_Mlym | Malayalam | Mlym | Dravidian
mar_Deva | Marathi | Deva | Indo-Aryan
mkd_Cyrl | Macedonian | Cyrl | Balto-Slavic
mlt_Latn | Maltese | Latn | Afro-Asiatic
mri_Latn | Maori | Latn | Austronesian
mya_Mymr | Burmese | Mymr | Sino-Tibetan
nld_Latn | Dutch | Latn | Germanic
nob_Latn | Norwegian Bokmål | Latn | Germanic
npi_Deva | Nepali | Deva | Indo-Aryan
npi_Latn | Nepali (Romanized) | Latn | Indo-Aryan
nso_Latn | Northern Sotho | Latn | Atlantic-Congo
nya_Latn | Nyanja | Latn | Afro-Asiatic
ory_Orya | Odia | Orya | Indo-Aryan
pan_Guru | Eastern Panjabi | Guru | Indo-Aryan
pbt_Arab | Southern Pashto | Arab | Indo-Aryan
pes_Arab | Western Persian | Arab | Iranian
plt_Latn | Plateau Malagasy | Latn | Austronesian
pol_Latn | Polish | Latn | Balto-Slavic
por_Latn | Portuguese | Latn | Romance
ron_Latn | Romanian | Latn | Romance
rus_Cyrl | Russian | Cyrl | Balto-Slavic
shn_Mymr | Shan | Mymr | Kra-Dai
sin_Latn | Sinhala (Romanized) | Latn | Indo-Aryan
sin_Sinh | Sinhala | Sinh | Indo-Aryan
slk_Latn | Slovak | Latn | Balto-Slavic
slv_Latn | Slovenian | Latn | Balto-Slavic
sna_Latn | Shona | Latn | Atlantic-Congo
snd_Arab | Sindhi | Arab | Indo-Aryan
som_Latn | Somali | Latn | Afro-Asiatic
sot_Latn | Southern Sotho | Latn | Atlantic-Congo
spa_Latn | Spanish | Latn | Romance
srp_Cyrl | Serbian | Cyrl | Balto-Slavic
ssw_Latn | Swati | Latn | Atlantic-Congo
sun_Latn | Sundanese | Latn | Austronesian
swe_Latn | Swedish | Latn | Germanic
swh_Latn | Swahili | Latn | Atlantic-Congo
tam_Taml | Tamil | Taml | Dravidian
tel_Telu | Telugu | Telu | Dravidian
tgk_Cyrl | Tajik | Cyrl | Iranian
tgl_Latn | Tagalog | Latn | Austronesian
tha_Thai | Thai | Thai | Kra-Dai
tir_Ethi | Tigrinya | Ethi | Afro-Asiatic
tsn_Latn | Tswana | Latn | Atlantic-Congo
tso_Latn | Tsonga | Latn | Afro-Asiatic
tur_Latn | Turkish | Latn | Turkic
ukr_Cyrl | Ukrainian | Cyrl | Balto-Slavic
urd_Arab | Urdu | Arab | Indo-Aryan
urd_Latn | Urdu (Romanized) | Latn | Indo-Aryan
uzn_Latn | Northern Uzbek | Latn | Turkic
vie_Latn | Vietnamese | Latn | Austroasiatic
war_Latn | Waray | Latn | Austronesian
wol_Latn | Wolof | Latn | Atlantic-Congo
xho_Latn | Xhosa | Latn | Atlantic-Congo
yor_Latn | Yoruba | Latn | Atlantic-Congo
zho_Hans | Chinese (Simplified) | Hans | Sino-Tibetan
zho_Hant | Chinese (Traditional) | Hant | Sino-Tibetan
zsm_Latn | Standard Malay | Latn | Austronesian
zul_Latn | Zulu | Latn | Atlantic-Congo
提供机构:
maas
创建时间:
2025-05-20
搜集汇总
数据集介绍

背景与挑战
背景概述
Belebele是一个多语言机器阅读理解数据集,涵盖122种语言变体,每个问题有四个多项选择答案,并与FLORES-200数据集中的短文相关联。数据集设计用于评估单语和多语模型在高、中、低资源语言中的表现,且完全并行,可直接比较不同语言的模型性能。
以上内容由遇见数据集搜集并总结生成



