Unfun Corpus
收藏arXiv2024-02-23 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2403.00794v1
下载链接
链接失效反馈官方服务:
资源简介:
Unfun Corpus是由哥伦比亚大学研究团队开发的一个数据集,旨在通过编辑文本移除幽默元素,以支持计算幽默的研究。该数据集包含约11831条经过编辑的讽刺新闻标题,这些标题来自The Onion,通过编辑使其变得严肃。数据集的创建过程涉及使用大型语言模型(LLMs)来编辑和生成与原始幽默文本相对应的非幽默文本。Unfun Corpus的应用领域主要集中在改进幽默检测和生成系统,以及探索幽默的基本属性。
Unfun Corpus is a dataset developed by a research team at Columbia University, designed to eliminate humorous elements from edited texts to support computational humor research. It comprises approximately 11,831 edited satirical news headlines sourced from The Onion, which were revised to remove comedic effects and sound serious. The dataset was created using large language models (LLMs) to edit and generate non-humorous texts corresponding to the original humorous content. The primary application areas of the Unfun Corpus include improving humor detection and generation systems, as well as exploring the fundamental properties of humor.
提供机构:
哥伦比亚大学
创建时间:
2024-02-23



