jonny-vr/multilingual-mcq-consistency
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/jonny-vr/multilingual-mcq-consistency
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
- de
- id
- pt
- ar
- bn
- sw
- es
- ru
- fr
- ja
- zh
task_categories:
- multiple-choice
pretty_name: Multilingual MCQ Consistency Dataset
configs:
- config_name: default
data_files:
- split: train
path: train.jsonl
- split: validation
path: validation.jsonl
- split: test
path: test.jsonl
---
# Multilingual MCQ Consistency Dataset
Multilingual factual multiple-choice QA dataset across 12 languages.
## Structure
Each example contains:
- `fact_id`
- `property_id`
- `subject`, `relation`, `object`
- `langs`: dict of language-specific QA
Each language entry:
- `question`
- `answer_text`
- `options`
```json
{
"fact_id": "...",
"property_id": "...",
"langs": {
"en": { "question": "...", "answer_text": "...", "options": [...] },
...
}
}
```
## Use case
Designed for:
- RL training (reward = correctness + cross-lingual consistency)
- multilingual evaluation
---
language:
- 英语(en)
- 德语(de)
- 印尼语(id)
- 葡萄牙语(pt)
- 阿拉伯语(ar)
- 孟加拉语(bn)
- 斯瓦希里语(sw)
- 西班牙语(es)
- 俄语(ru)
- 法语(fr)
- 日语(ja)
- 中文(zh)
task_categories:
- 多项选择(multiple-choice)
pretty_name: 多语言多项选择一致性数据集(Multilingual MCQ Consistency Dataset)
configs:
- config_name: default
data_files:
- split: train
path: train.jsonl
- split: validation
path: validation.jsonl
- split: test
path: test.jsonl
---
# 多语言多项选择一致性数据集(Multilingual MCQ Consistency Dataset)
该数据集是覆盖12种语言的多语言事实类多项选择问答数据集。
## 数据结构
每个数据样本包含以下字段:
- `fact_id`
- `property_id`
- `subject`(主语)、`relation`(关系)、`object`(宾语)
- `langs`:语言专属问答字典
每个语言条目包含:
- `question`:问题文本
- `answer_text`:标准答案文本
- `options`:候选选项列表
json
{
"fact_id": "...",
"property_id": "...",
"langs": {
"en": { "question": "...", "answer_text": "...", "options": [...] },
...
}
}
## 应用场景
本数据集旨在应用于:
- 强化学习(RL)训练(奖励函数为正确性得分与跨语言一致性得分之和)
- 多语言模型评估
提供机构:
jonny-vr



