DogWhistle
收藏arXiv2021-06-08 更新2024-06-21 收录
下载链接:
https://competitions.codalab.org/competitions/30451
下载链接
链接失效反馈官方服务:
资源简介:
DogWhistle是由微软亚洲研究院创建的一个大型中文数据集,专注于从计算语言学角度理解和创建cant。该数据集通过精心设计的在线游戏收集,包含丰富多样的cant,用于广泛的隐藏词汇。创建过程中强调了语义而非形态的重要性,并允许使用表情符号。DogWhistle数据集旨在测试预训练语言模型在深层次语言理解、常识和世界知识方面的能力,同时也作为中间任务转移的复杂语言资源,帮助模型在其他任务上表现更好。
DogWhistle is a large-scale Chinese dataset created by Microsoft Research Asia, focusing on understanding and generating cant from the perspective of computational linguistics. This dataset is collected through a meticulously designed online game, encompassing a rich and diverse collection of cant terms for a wide range of hidden lexical scenarios. Its development process emphasizes semantics over morphology, and permits the use of emojis. The DogWhistle dataset aims to test the capabilities of pre-trained language models in deep language understanding, common sense and world knowledge, while also serving as a complex linguistic resource for intermediate task transfer to help models achieve better performance on other downstream tasks.
提供机构:
微软亚洲研究院
创建时间:
2021-04-07



