five

Disco-Bench

收藏
arXiv2023-07-22 更新2024-06-21 收录
下载链接:
https://github.com/longyuewangdcu/Disco-Bench
下载链接
链接失效反馈
官方服务:
资源简介:
Disco-Bench是由腾讯人工智能实验室创建的一个数据集,专注于评估语言模型在处理话语现象方面的能力。该数据集包含9个文档级别的测试集,主要涉及文学领域,涵盖中文和/或英文,旨在评估模型在理解、翻译和生成等NLP任务中的表现。数据集的创建过程涉及从文学文本中提取丰富的语篇现象(如衔接和连贯性),并设计了一个诊断测试套件来检查模型是否学习了语篇知识。Disco-Bench的应用领域包括评估单语和跨语言模型理解文本和生成文本的能力,以及改进模型在处理语篇信息方面的性能。

Disco-Bench is a dataset developed by Tencent AI Lab, which focuses on evaluating the capabilities of language models in handling discourse phenomena. This dataset includes 9 document-level test sets primarily from the literary domain, covering both Chinese and English, and aims to assess model performance on NLP tasks such as text understanding, translation, and generation. The construction of Disco-Bench involves extracting rich discourse phenomena (such as cohesion and coherence) from literary texts, and designing a diagnostic test suite to examine whether models have acquired discourse knowledge. The application fields of Disco-Bench include evaluating the text understanding and generation capabilities of monolingual and cross-lingual language models, as well as improving model performance when processing discourse information.
提供机构:
腾讯人工智能实验室
创建时间:
2023-07-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作