Reubencf/multilingual-doc-qa
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Reubencf/multilingual-doc-qa
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个多语言问答对数据集,专注于从文档中提取特定的事实细节,如页码、姓名、年龄和计数等。样本涵盖了多种语言,包括阿拉伯语、中文、日语、西班牙语、法语、德语和印地语,展示了跨语言信息检索的能力。每个条目包含一个询问特定细节的提示和一个提供源文本中精确答案的完成部分。数据集包含8,801个数据点,是一个指令调优数据集。最终质量为B级,相对质量提高了80.0%。领域分布为其他(50%)、历史(10%)和产品建议(6%)。语言分布为意大利语(14%)、日语(12%)和西班牙语(12%)。语气分布为信息性(68%)、清晰(18%)和帮助性(10%)。
This dataset contains multilingual question-answer pairs focused on extracting specific factual details from documents, such as page numbers, names, ages, and counts. The samples cover diverse languages including Arabic, Chinese, Japanese, Spanish, French, German, and Hindi, demonstrating cross-lingual information retrieval capabilities. Each entry consists of a prompt asking for a specific detail and a completion providing the precise answer found in the source text. There are 8,801 data points in this dataset. This is an instruction tuning dataset. The final quality is B, with a relative quality improvement of 80.0%. Domain: Other (50%), History (10%), Product-advice (6%). Language: Italian (14%), Japanese (12%), Spanish (12%). Tone: Informative (68%), Clear (18%), Helpful (10%).
提供机构:
Reubencf



