five

Cryptic dataset

收藏
DataCite Commons2025-12-05 更新2026-05-07 收录
下载链接:
https://datashare.ed.ac.uk/handle/10283/9132
下载链接
链接失效反馈
官方服务:
资源简介:
The Cryptic dataset is constructed using from A Dataset of Cryptic Crossword Clues by George Ho (available at https://cryptics.georgeho.org) This database consists of cryptic clues from multiple different sources, incuding the New York Times, The Guardian, The Times and The Hindu. Each row in the database stores the clue (normally a cryptic sentence or two along with the length of the answer), the answer (normally a word or two), the definition (one or a few words, almost always either the start of the clue or the end). We have extracted each clue that consists of 6 words from the database, and encoded the words with the BERT encoder to 6 word embeddings of dimension 768. The task is to determine whether the cryptic clue contains the definition at the beginning or end of the clue. This makes a binary classification problem. The labels are as follows {beginning: 1, end: 0}. The shape of the input data is (n, 1, 6, 768), and the labels are binary (0 or 1). The extracted data is split into three sets, the test set consists of clues from The Times, the validation set from The Hindu, and the train set from all other sources.
提供机构:
University of Edinburgh. School of Engineering
创建时间:
2025-12-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作