DaNetQA
收藏arXiv2020-10-15 更新2024-06-21 收录
下载链接:
https://github.com/PragmaticsLab/DaNetQA
下载链接
链接失效反馈官方服务:
资源简介:
DaNetQA是由俄罗斯高等经济学院创建的一个针对俄语的yes/no问答数据集。该数据集包含2691个自然生成的yes/no问题,每个问题都与一个来自维基百科的段落配对,并附有基于段落的答案。数据集通过众包平台Yandex.Toloka收集,利用Google API检索相关维基百科页面,并由众包工人手动标记答案。DaNetQA主要用于评估深度上下文编码器,如BERT或XLM-R,在俄语问答技术中的应用,旨在解决俄语问答数据集的缺乏问题。
DaNetQA is a yes/no question answering dataset for the Russian language, created by the National Research University Higher School of Economics (HSE). It contains 2,691 naturally generated yes/no questions, each paired with a corresponding Wikipedia paragraph and a paragraph-based answer. The dataset was collected via the crowdsourcing platform Yandex.Toloka, where relevant Wikipedia pages were retrieved using the Google API, and answers were manually annotated by crowdsourced workers. DaNetQA is primarily used to evaluate deep contextual encoders such as BERT or XLM-R for Russian question answering tasks, aiming to address the shortage of existing Russian-language question answering datasets.
提供机构:
俄罗斯高等经济学院
创建时间:
2020-10-06



