Molweni
收藏arXiv2020-11-07 更新2024-06-21 收录
下载链接:
https://github.com/HIT-SCIR/Molweni
下载链接
链接失效反馈官方服务:
资源简介:
Molweni数据集是由哈尔滨工业大学和新加坡国立大学合作创建的,专注于多方对话的机器阅读理解(MRC)。该数据集源自Ubuntu Chat Corpus,包含10,000个对话,总计88,303条发言,并标注了30,066个问题,包括可回答和不可回答的问题。Molweni独特之处在于其对所有多方对话进行了话语依赖性标注,采用改进的Segmented Discourse Representation Theory(SDRT)风格,共标注了78,245个话语关系。数据集主要用于解决多方对话的话语解析问题,挑战现有的MRC模型,特别是在处理非连续性和复杂结构的话语关系时。
The Molweni dataset was collaboratively developed by Harbin Institute of Technology and National University of Singapore, specializing in multi-party dialogue machine reading comprehension (MRC). Derived from the Ubuntu Chat Corpus, this dataset consists of 10,000 dialogues totaling 88,303 utterances, and is annotated with 30,066 questions covering both answerable and unanswerable cases. What distinguishes Molweni is its annotation of utterance dependencies across all multi-party dialogues in the style of improved Segmented Discourse Representation Theory (SDRT), with a total of 78,245 utterance relationships labeled. This dataset is primarily designed to tackle the problem of utterance parsing for multi-party dialogues, and poses challenges to existing MRC models, particularly when handling discontinuous and structurally complex utterance relationships.
提供机构:
哈尔滨工业大学
创建时间:
2020-04-10



