CohereForAI/aya_evaluation_suite
收藏数据集概述
数据集名称
Aya Evaluation Suite
数据集描述
Aya Evaluation Suite 包含总共 26,750 个开放式对话风格的提示,用于评估多语言开放式生成质量。为了平衡语言覆盖范围和人工策划的质量,我们创建了一个评估套件,包括:
- 7 种语言的人工策划示例 (
tur, eng, yor, arb, zho, por, tel) →aya-human-annotated。 - 101 种语言的精选示例的机器翻译 →
dolly-machine-translated。 - 6 种语言的人工后期编辑翻译 (
hin, srp, rus, fra, arb, spa) →dolly-human-edited。
数据集组成
数据集包括以下子集:
- aya-human-annotated: 250 个原始人工编写的提示,每种语言 7 个。
- dolly-machine-translated: 200 个从 databricks-dolly-15k 中人工选择的提示,使用 NLLB 模型 从英语自动翻译成 101 种语言(总共 114 个方言)。
- dolly-human-edited: 200 个 dolly-machine-translated 提示,由 6 种语言的流利说话者进行后期编辑。
数据字段
id: 数据点的唯一 ID。inputs: 语言模型的提示或输入。targets: 语言模型的完成或输出。(不适用于dolly-human-edited)language: 提示和完成的语言。script: 语言的书写系统。source_id: 对应于 databricks-dolly-15k 数据集的原始行索引(仅适用于dolly-machine-translated和dolly-human-edited子集)。
数据实例
示例数据实例来自 Aya Evaluation Suite 子集:
aya-human-annotated
json { "id": 42, "inputs": "What day is known as Star Wars Day?", "targets": "May 4th (May the 4th be with you!)", "language": "eng", "script": "Latn", }
dolly-machine-translated
json { "id": 2, "inputs": "How to escape from a helicopter trapped in water ?", "targets": "If you are ever trapped inside a helicopter while submerged in water, it’s best to try and remain calm until the cabin is completely underwater. It’s better to wait for pressure to be equalized, before you try to open the door or break the glass to escape.", "language": "eng", "script": "Latn", "source_id": 6060, }
dolly-human-edited
json { "id": 2, "inputs": "Comment peut-on séchapper dun hélicoptère piégé dans leau ?", "targets": "-", "language": "fra", "script": "Latn", "source_id": 6060, }
语言统计
aya-human-annotated
| ISO Code | Language | Resources |
|---|---|---|
tel |
Telugu | Low |
yor |
Yorùbá | Low |
arb |
Arabic | High |
tur |
Turkish | High |
por |
Portuguese | High |
zho |
Chinese (Simplified) | High |
eng |
English | High |
dolly-machine-translated
| ISO Code | Language | Resources |
|---|---|---|
ace |
Achinese | Low |
afr |
Afrikaans | Mid |
| ... | ... | ... |
zul |
Zulu | Low |
dolly-human-edited
| ISO Code | Language | Resources |
|---|---|---|
arb |
Arabic | High |
fra |
French | High |
hin |
Hindi | High |
rus |
Russian | High |
spa |
Spanish | High |
srp |
Serbian | High |
已知限制
- 翻译质量:
dolly-machine-translated子集的表达能力受限于翻译模型的质量,可能会影响对翻译不足语言的能力估计。如果使用此子集进行测试,建议与专业后期编辑的dolly-human-edited子集或aya-human-annotated集配对报告。



