five

BiPaR

收藏
arXiv2019-10-11 更新2024-06-21 收录
下载链接:
https://multinlp.github.io/BiPaR/
下载链接
链接失效反馈
官方服务:
资源简介:
BiPaR是由苏州大学计算机科学与技术学院创建的双语平行小说风格机器阅读理解数据集,旨在支持多语言和跨语言阅读理解研究。该数据集包含3667个中英文平行段落,构建了14668个平行问答对,通过严格的质量控制流程由众包工作者完成。BiPaR的特点在于每个(段落, 问题, 答案)三元组都是双语平行的,且内容来源于小说,为机器阅读理解提供了新的挑战和应用领域,如解决指代消解、多句推理和隐含因果关系理解等问题。

BiPaR is a bilingual parallel novel-style machine reading comprehension dataset developed by the School of Computer Science and Technology, Soochow University, aiming to support multilingual and cross-lingual reading comprehension research. This dataset contains 3667 parallel Chinese-English paragraph pairs, from which 14668 parallel question-answer pairs are constructed; all annotations were completed by crowdworkers via a strict quality control process. A core feature of BiPaR is that every (paragraph, question, answer) triple is bilingual parallel, with all content sourced from novels. This dataset provides new challenges and application fields for machine reading comprehension, such as solving coreference resolution, multi-sentence reasoning and implicit causal relation understanding.
提供机构:
苏州大学计算机科学与技术学院
创建时间:
2019-10-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作