KazQAD
收藏arXiv2024-04-06 更新2024-06-21 收录
下载链接:
https://github.com/IS2AI/KazQAD
下载链接
链接失效反馈官方服务:
资源简介:
KazQAD是一个针对哈萨克语的开域问答数据集,由智能系统与人工智能研究所等机构创建,包含近6000个独特问题及其简短答案,以及约12000个段落级相关性判断。数据集内容丰富,涵盖了从自然问题数据集翻译的问题和哈萨克统一国家测试考试中的问题,以及超过800,000个来自哈萨克维基百科的段落。创建过程中,结合了机器翻译和人工标注,确保了数据的高质量和效率。该数据集适用于阅读理解和全面开域问答设置,以及信息检索实验,旨在推动哈萨克语在自然语言处理和信息检索领域的应用。
KazQAD is an open-domain question answering dataset tailored for the Kazakh language, developed by institutions including the Institute of Intelligent Systems and Artificial Intelligence. It contains nearly 6,000 unique questions paired with concise answers, as well as approximately 12,000 passage-level relevance annotations. The dataset covers diverse content, including questions translated from the Natural Questions dataset, questions from the Unified State National Exam of Kazakhstan, and over 800,000 passages sourced from Kazakh Wikipedia. During the dataset construction process, machine translation and manual annotation were integrated to guarantee both high data quality and construction efficiency. This dataset is applicable to reading comprehension and full-scale open-domain question answering settings, as well as information retrieval experiments, and aims to advance the application of the Kazakh language in the fields of natural language processing and information retrieval.
提供机构:
智能系统与人工智能研究所
创建时间:
2024-04-06



