CreativeLang/scope_simile_generation
收藏数据集概述
- 名称: SCOPE Simile
- 目的: 用于生成比喻句,从字面描述句中产生比喻。
- 方法: 采用两步法,首先将自标记的比喻转换为字面句,然后使用seq2seq模型在这些[字面句, 比喻]对上进行微调以生成比喻。
- 数据来源: 数据集收集自Reddit的WRITINGPROMPTS和FUNNY子论坛,通过搜索短语“like a”来识别比喻。
- 数据规模: 包含87,843个人类编写的自标记比喻,其中82,697个用于训练,5,146个用于验证。
- 转换方法: 使用COMET框架识别比喻中的共享属性,并选择前5个常识属性来形成可能的字面版本,然后使用GPT模型的困惑度分数进行排名。
- 语言: 英语
- 创建时间: 2020年
数据集详情
- 论文: Generating similes effortlessly like a Pro: A Style Transfer Approach for Simile Generation
- 元数据: 存储于Creative Language Toolkit (CLTK)
- CL类型: 比喻
- 任务类型: 生成
- 大小: 约87,000个比喻
引用信息
若使用此数据集,请引用以下文献:
@inproceedings{chakrabarty-etal-2020-generating, title = "Generating similes effortlessly like a Pro: A Style Transfer Approach for Simile Generation", author = "Chakrabarty, Tuhin and Muresan, Smaranda and Peng, Nanyun", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.emnlp-main.524", pages = "6455--6469", abstract = "Literary tropes, from poetry to stories, are at the crux of human imagination and communication. Figurative language such as a simile go beyond plain expressions to give readers new insights and inspirations. In this paper, we tackle the problem of simile generation. Generating a simile requires proper understanding for effective mapping of properties between two concepts. To this end, we first propose a method to automatically construct a parallel corpus by transforming a large number of similes collected from Reddit to their literal counterpart using structured common sense knowledge. We then propose to fine-tune a pre-trained sequence to sequence model, BART (Lewis et al 2019), on the literal-simile pairs to gain generalizability, so that we can generate novel similes given a literal sentence. Experiments show that our approach generates 88{%} novel similes that do not share properties with the training data. Human evaluation on an independent set of literal statements shows that our model generates similes better than two literary experts 37{%} of the time when compared pairwise. We also show how replacing literal sentences with similes from our best model in machine-generated stories improves evocativeness and leads to better acceptance by human judges.", }



