Cryptonite
收藏arXiv2021-11-02 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2103.01242v2
下载链接
链接失效反馈官方服务:
资源简介:
Cryptonite是由特拉维夫大学创建的大规模数据集,包含523,114条源自专业编写的英语谜语填字游戏的谜语线索。该数据集旨在解决自然语言处理中的极端歧义问题,每个谜语线索都包含误导性的表面解读,需要通过语义、句法和语音的歧义消除以及世界知识来解答。数据集的创建过程涉及从The Times和The Telegraph收集数据,并进行了预处理以确保数据质量。Cryptonite的应用领域主要集中在评估和提升模型在处理复杂语言歧义任务上的能力。
Cryptonite is a large-scale dataset created by Tel Aviv University, containing 523,114 crossword clues sourced from professionally crafted English crossword puzzles. This dataset is designed to address extreme ambiguity issues in natural language processing. Each clue includes misleading surface-level interpretations, and solving it requires disambiguation of semantic, syntactic and phonetic ambiguities as well as the use of world knowledge. The dataset was developed by collecting data from The Times and The Telegraph, followed by preprocessing to ensure data quality. The primary applications of Cryptonite focus on evaluating and enhancing the capabilities of models to handle tasks involving complex linguistic ambiguities.
提供机构:
特拉维夫大学
创建时间:
2021-03-02



