jvonrad/multilingual-mcq-consistency
收藏Hugging Face2026-03-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/jvonrad/multilingual-mcq-consistency
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
- de
- id
- pt
- ar
- bn
- sw
- es
- ru
- fr
- ja
- zh
task_categories:
- multiple-choice
pretty_name: Multilingual MCQ Consistency Dataset
configs:
- config_name: default
data_files:
- split: train
path: train.jsonl
- split: validation
path: validation.jsonl
- split: test
path: test.jsonl
---
# Multilingual MCQ Consistency Dataset
Multilingual factual multiple-choice QA dataset across 12 languages.
## Structure
Each example contains:
- `fact_id`
- `property_id`
- `subject`, `relation`, `object`
- `langs`: dict of language-specific QA
Each language entry:
- `question`
- `answer_text`
- `options`
```json
{
"fact_id": "...",
"property_id": "...",
"langs": {
"en": { "question": "...", "answer_text": "...", "options": [...] },
...
}
}
```
## Use case
Designed for:
- RL training (reward = correctness + cross-lingual consistency)
- multilingual evaluation
---
language:
- 英语(en)
- 德语(de)
- 印尼语(id)
- 葡萄牙语(pt)
- 阿拉伯语(ar)
- 孟加拉语(bn)
- 斯瓦西里语(sw)
- 西班牙语(es)
- 俄语(ru)
- 法语(fr)
- 日语(ja)
- 中文(zh)
task_categories:
- 多项选择(multiple-choice)
pretty_name: 多语言多项选择题一致性数据集(Multilingual MCQ Consistency Dataset)
configs:
- config_name: default
data_files:
- split: train
path: train.jsonl
- split: validation
path: validation.jsonl
- split: test
path: test.jsonl
---
# 多语言多项选择题一致性数据集
本数据集为覆盖12种语言的多语言事实类多项选择问答数据集。
## 数据结构
每个样本包含以下字段:
- `fact_id`:事实ID
- `property_id`:属性ID
- `subject`、`relation`、`object`:主题、关系与客体
- `langs`:存储各语言专属问答信息的字典
各语言问答条目包含:
- `question`:问题文本
- `answer_text`:正确答案文本
- `options`:候选选项列表
json
{
"fact_id": "...",
"property_id": "...",
"langs": {
"en": { "question": "...", "answer_text": "...", "options": [...] },
...
}
}
## 应用场景
本数据集专为以下场景设计:
- 强化学习(Reinforcement Learning, RL)训练(奖励函数由正确性与跨语言一致性共同构成)
- 多语言模型评估
提供机构:
jvonrad



