FLORES-101

Name: FLORES-101
Creator: OpenDataLab
Published: 2026-05-17 05:30:06
License: 暂无描述

OpenDataLab2026-05-17 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/FLORES-101

下载链接

链接失效反馈

官方服务：

资源简介：

FLORES 评估基准由从英语维基百科中提取的 3001 个句子组成，涵盖各种不同的主题和领域。这些句子已由专业翻译人员通过严格控制的过程翻译成 101 种语言。生成的数据集可以更好地评估低资源语言的长尾模型质量，包括评估多对多多语言翻译系统，因为所有翻译都是多语言对齐的。通过公开发布如此高质量和高覆盖率的数据集，我们希望促进机器翻译社区及其他领域的进步。论文：低资源和多语言机器翻译的 FLORES-101 评估基准

The FLORES evaluation benchmark consists of 3001 sentences extracted from English Wikipedia, covering a wide range of diverse topics and domains. These sentences have been translated into 101 languages by professional translators through a rigorously controlled process. The resulting dataset enables more robust assessment of long-tail model quality for low-resource languages, including the evaluation of many-to-many multilingual translation systems, as all translations are multilingually aligned. By publicly releasing such a high-quality and high-coverage dataset, we seek to foster advancements in the machine translation community and beyond. Paper: FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

提供机构：

OpenDataLab

创建时间：

2022-05-23

搜集汇总

数据集介绍

背景与挑战

背景概述

FLORES-101是一个多语言机器翻译评估数据集，包含从英语维基百科提取的3001个句子，并由专业翻译人员翻译成101种语言，覆盖广泛主题。该数据集旨在支持低资源语言和多语言翻译系统的质量评估，所有翻译均经过多语言对齐，以提高评估的准确性和可比性。由Facebook AI Research于2021年发布，采用CC BY-SA 4.0许可证，促进机器翻译领域的进步。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集