NYK-MS

Name: NYK-MS
Creator: 北京大学
Published: 2024-09-02 16:14:49
License: 暂无描述

arXiv2024-09-02 更新2024-09-06 收录

下载链接：

https://github.com/jmhessel/caption_contest_corpus

下载链接

链接失效反馈

官方服务：

资源简介：

NYK-MS数据集是由北京大学创建的一个多模态隐喻和讽刺理解基准，包含1,583个隐喻样本和1,578个讽刺样本，总计3,161个样本。数据集来源于《纽约客》漫画标题比赛，由专业画家创作的漫画和读者提交的标题组成。数据集经过多轮注释以提高一致性和质量，并使用GPT-4V辅助注释。该数据集旨在解决多模态隐喻和讽刺理解任务，包括分类、词检测和解释等7个任务，适用于自然语言处理和计算机视觉领域的研究。

The NYK-MS Dataset is a multimodal metaphor and irony understanding benchmark created by Peking University. It includes 1,583 metaphor samples and 1,578 irony samples, totaling 3,161 samples in all. The dataset is sourced from The New Yorker Cartoon Caption Contest, which comprises original cartoons drawn by professional cartoonists and captions submitted by readers. It has undergone multiple rounds of annotation to improve consistency and quality, with GPT-4V employed to assist in the annotation process. This dataset targets multimodal metaphor and irony understanding tasks, covering seven tasks such as classification, word detection and explanation, and is applicable to research in the fields of natural language processing and computer vision.

提供机构：

北京大学

创建时间：

2024-09-02

原始信息汇总

数据集概述

数据集名称

Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest

数据集来源

The New Yorker Caption Contest

数据集描述

该数据集包含来自《纽约客》漫画标题比赛的幽默理解基准。数据集包括多个任务，如匹配、排序和解释生成。

数据集任务

匹配任务：选择与图像最匹配的标题。
排序任务：对多个标题进行排序，选择最合适的标题。
解释生成任务：生成对幽默标题的解释。

数据集结构

匹配任务：包含多个标题选项和对应的图像描述，标签指示正确答案。
排序任务：包含两个标题选项和对应的图像描述，标签指示正确答案。
解释生成任务：包含标题和对应的解释。

数据集示例

匹配任务示例

json { "caption_choices": [ "Tell me about your childhood very quickly.", "Believe me . . . its whats UNDER the ground thats most interesting.", "Stop me if youve heard this one.", "I have trouble saying no.", "Yes, I see the train but I think we can beat it." ], "contest_number": 49, "entities": [ "https://en.wikipedia.org/wiki/Rule_of_three_(writing)", "https://en.wikipedia.org/wiki/Bar_joke", "https://en.wikipedia.org/wiki/Religious_institute" ], "from_description": "scene: a bar description: Two priests and a rabbi are walking into a bar, as the bartender and another patron look on. The bartender talks on the phone while looking skeptically at the incoming crew. uncanny: The scene depicts a very stereotypical bar joke that would be unlikely to be encountered in real life; the skepticism of the bartender suggests that he is aware he is seeing this trope, and is explaining it to someone on the phone. entities: Rule_of_three_(writing), Bar_joke, Religious_institute. choices A: Tell me about your childhood very quickly. B: Believe me . . . its whats UNDER the ground thats most interesting. C: Stop me if youve heard this one. D: I have trouble saying no. E: Yes, I see the train but I think we can beat it.", "image": "<PIL.JpegImagePlugin.JpegImageFile image mode=L size=323x231 at 0x7F34F283E9D0>", "image_description": "Two priests and a rabbi are walking into a bar, as the bartender and another patron look on. The bartender talks on the phone while looking skeptically at the incoming crew.", "image_location": "a bar", "image_uncanny_description": "The scene depicts a very stereotypical bar joke that would be unlikely to be encountered in real life; the skepticism of the bartender suggests that he is aware he is seeing this trope, and is explaining it to someone on the phone.", "instance_id": "21125bb8787b4e7e82aa3b0a1cba1571", "label": "C", "n_tokens_label": 1, "questions": [ "What is the bartender saying on the phone in response to the living, breathing, stereotypical bar joke that is unfolding?" ] }

排序任务示例

json { "choices": { "A": "Looks to be a herniated disco.", "B": "Everyone, wish upon a star!" }, "image": "fc79106cf3660f5b81cdbeed0f968d98.jpg", "instance_id": "cba6d1ce5711ad56c31e5577f3207ac3" }

解释生成任务示例

json { "caption": "Please! I have a wife and two thousand kids!", "contest_number": 509, "explanation": "A play on the common plea people use in dire situations: I have a wife and two kids; this is stated to try to have people take mercy and not kill someone. But here, the victim of the bear is a fish about to be eaten, and fish tend to have many more than two kids, so the phrase is updated with the fish-version of it: two thousand kids.", "n_expl_toks": 70 }

数据集下载

卡通图像：下载链接
原始标注文件：下载链接
任务分割文件：下载链接
解释生成数据：下载链接

引用信息

如果使用该数据集，请引用以下文献：

@inproceedings{hessel2023androids, title={Do Androids Laugh at Electric Sheep? {Humor} ``Understanding Benchmarks from {The New Yorker Caption Contest}}, author={Hessel, Jack and Marasovi{c}, Ana and Hwang, Jena D. and Lee, Lillian and Da, Jeff and Zellers, Rowan and Mankoff, Robert and Choi, Yejin}, booktitle={Proceedings of the ACL}, year={2023} }

搜集汇总

数据集介绍

构建方式

NYK-MS数据集的构建主要基于《纽约客》漫画标题比赛的数据集，该数据集由专业画家创作的漫画和读者提交的标题组成。研究人员从中选择了包含隐喻和讽刺的样本，并对每个样本进行了多轮注释，以确保注释的一致性和质量。注释过程使用了GUI和GPT-4V来提高效率。最终，数据集包含了1583个隐喻理解和1578个讽刺理解任务样本，每个任务都至少由3个注释者进行了详细标注。

使用方法

使用NYK-MS数据集时，首先需要对数据进行预处理，包括数据清洗、标注等。然后，可以使用各种机器学习模型进行训练和评估，例如BERT和ViT等。在训练过程中，可以使用对比学习和最优传输等方法进行模态对齐，以提高模型的性能。最后，可以使用评估指标，如准确率、召回率和F1值等，来评估模型的性能。

背景与挑战

背景概述

NYK-MS数据集的研究背景源于对隐喻和讽刺等比喻性表达的理解需求，特别是在互联网和青少年中流行的表情包上。该数据集由北京大学的多媒体信息处理国家重点实验室的研究团队于2024年创建，旨在为多模态隐喻和讽刺理解任务提供一个新的基准。NYK-MS数据集包含1583个隐喻理解任务样本和1578个讽刺理解任务样本，所有7个任务均由至少3名标注员进行了详细标注。研究团队通过多轮标注来提高标注的一致性和质量，并使用图形用户界面(GUI)和GPT-4V来提高标注效率。该数据集的创建对多模态隐喻和讽刺理解研究具有重要意义，为相关领域的研究提供了新的数据和基准。

当前挑战

NYK-MS数据集面临的挑战主要包括：1) 在零样本情况下，大型语言模型(LLM)和大型多模态模型(LMM)在分类任务上的表现不佳；2) 在传统预训练模型上，尽管使用了增强和校准方法，模型在NYK-MS数据集上的表现仍有待提高；3) 数据集的规模相对较小，可能限制模型的泛化能力；4) 数据集仅包含图像和文本模态，无法用于视频、音频等任务；5) 数据集的内容为卡通和标题，当模型在其他情况下（如推文）进行推理时，其性能可能受限。

常用场景

经典使用场景

NYK-MS数据集被广泛应用于理解多模态隐喻和讽刺的任务中。该数据集包含了丰富的卡通和字幕对，以及针对隐喻和讽刺的理解任务。这些任务包括是否包含隐喻/讽刺、哪个词或对象包含隐喻/讽刺、它讽刺了什么以及为什么包含隐喻/讽刺等。通过对这些任务的标注和实验，NYK-MS数据集为研究多模态隐喻和讽刺理解提供了重要的数据支持。

解决学术问题

NYK-MS数据集解决了多模态隐喻和讽刺理解任务中的标注一致性问题和模型性能提升问题。通过对标注流程的优化和模型实验，该数据集提高了标注的一致性和模型的性能。此外，NYK-MS数据集还解决了以往数据集在隐喻和讽刺理解任务中的局限性，如标注方式单一、数据来源不丰富等。

实际应用

NYK-MS数据集在实际应用中具有广泛的前景。它可以用于开发智能对话系统、情感分析工具、文本生成模型等。通过使用NYK-MS数据集训练的模型，可以更好地理解用户的隐喻和讽刺表达，从而提高人机交互的自然性和准确性。

数据集最近研究