ehasler/mlqa

Name: ehasler/mlqa
Creator: ehasler
Published: 2026-04-28 16:37:02
License: 暂无描述

Hugging Face2026-04-28 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/ehasler/mlqa

下载链接

链接失效反馈

官方服务：

资源简介：

MLQA是一个多语言问答数据集，支持阿拉伯语（ar）、德语（de）、英语（en）、西班牙语（es）、印地语（hi）、越南语（vi）和中文（zh）等多种语言。每个数据样本包含一个上下文段落、一个基于上下文的问题，以及一个或多个答案（每个答案包括在上下文中的起始位置和答案文本），并配有唯一ID。数据集分为验证集和测试集，用于评估跨语言问答系统的性能。例如，英语配置包含11,590个测试样本和1,148个验证样本，其他语言规模类似但略有差异。该数据集适用于机器阅读理解、问答模型训练和跨语言NLP研究。

MLQA is a multilingual question answering dataset that includes multiple languages such as Arabic (ar), German (de), English (en), Spanish (es), Hindi (hi), Vietnamese (vi), and Chinese (zh). Each data sample consists of a context passage, a question based on the context, and one or more answers (each with a start position in the context and answer text), along with a unique ID. The dataset is split into validation and test sets, designed for evaluating cross-lingual question answering systems. For instance, the English configuration has 11,590 test examples and 1,148 validation examples, with similar but varying sizes for other languages. It is suitable for machine reading comprehension, QA model training, and cross-lingual NLP research.

提供机构：

ehasler

5,000+

优质数据集

54 个

任务类型

进入经典数据集