five

Dataset of limericks for computational poetics

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5520077
下载链接
链接失效反馈
官方服务:
资源简介:
Herein is a data set comprising 98k limericks scraped from the The Omnificent English Dictionary In Limerick Form - OEDILF. It is a subset of the full data set, filtered to pass a basic test of standard limerick form (i.e., ensuring five lines, no emojis, no symbols). Each limerick was written by a human contributor whose work has passed through a rigorous moderation. This dataset is released alongside two companion papers: "BPoMP: The Benchmark of Poetic Minimal Pairs – Limericks, Rhyme, and Narrative Coherence" (Abdibayev, Riddell, Rockmore, RANLP 2021) and "Automating the Detection of Poetic Features: The Limerick as Model Organism" (Abdibayev, Riddell, Igarashi, Rockmore, SIGHUM 2021). The dataset is primarily released for use by NLP researchers interested in studying formal structure of poetry and more generally, interested in computational poetics. Each limerick is accompanied by metadata: author information, id within the website and "is_limerick" field, which denotes if limerick was recognized by our custom filter that was built to check for formal limerick properties (this tagging was a goal of the SIGHUM paper and reflects the results reported there - see the paper for details). Thus, if "is_limerick"=True this is a true positive,  "is_limerick"=False is (almost surely) a false negative. We identify 70% of these as limericks and provide the tagging as a benchmark for the community to improve upon. With these considerations in mind we hope that NLP community will use this dataset to study poetical knowledge of language models trained on large corpora as many of their properties still remain a mystery to the community at large. We are excited for the possibilities ahead! UPDATE: we released a new version of our dataset that contains all of the limericks that we planned to publish. Previous version (v2) was created using code that contained a bug which in turn lowered the number of available limericks.
创建时间:
2021-11-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作