DiscoEval
收藏arXiv2019-11-07 更新2024-06-21 收录
下载链接:
https://github.com/ZeweiChu/DiscoEval
下载链接
链接失效反馈官方服务:
资源简介:
DiscoEval是一个用于评估句子表示中话语相关知识的测试套件,由芝加哥大学和丰田技术学院在芝加哥的研究团队开发。该数据集包含多个任务组,覆盖多个领域,如维基百科、故事、对话和科学文献,旨在通过探测试验评估预训练句子表示中的话语知识。数据集中的任务基于句子排序、标注的话语关系和话语连贯性,数据部分通过半自动生成或基于人工标注。DiscoEval不仅用于评估模型,还提出了一套新的多任务学习目标,旨在增强句子编码器对文本分布语义的依赖,从而更好地捕捉文档结构中的信息。
DiscoEval is a test suite for evaluating discourse-related knowledge in sentence representations, developed by research teams from the University of Chicago and Toyota Technological Institute at Chicago. This dataset includes multiple task groups covering diverse domains such as Wikipedia, stories, dialogues and scientific literature, aiming to assess discourse knowledge within pre-trained sentence representations through diagnostic tests. The tasks in the dataset are based on sentence ordering, annotated discourse relations and discourse coherence, with the data being generated via semi-automatic methods or derived from manual annotations. Beyond being used for model evaluation, DiscoEval also proposes a novel set of multi-task learning objectives intended to enhance sentence encoders' reliance on the distributional semantics of text, thus better capturing the information embedded in document structures.
提供机构:
芝加哥大学, 伊利诺伊州, 美国 2丰田技术学院在芝加哥, 伊利诺伊州, 美国
创建时间:
2019-08-31



