New Yorker Cartoon Caption Contest Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/yguooo/cartoon-caption-generation
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个多模态偏好数据集,包含了超过2.5亿个关于220万条标题的人类评分,这些数据是通过在过去八年中对《纽约客》周刊漫画标题竞赛的众包评分数据收集而来的。该数据集支持多模态大型语言模型的开发和基于偏好微调算法的评价,特别是在幽默标题生成方面。其规模之大,涵盖了超过2.5亿个评分和220万条标题,任务重点在于幽默标题的生成与评估。
This dataset is a multimodal preference dataset containing over 250 million human ratings for 2.2 million captions. The data were collected from crowdsourced ratings of The New Yorker Weekly Cartoon Caption Contest over the past eight years. It supports the development of multimodal large language models and the evaluation of preference-based fine-tuning algorithms, particularly for humorous caption generation. Boasting its large scale with over 250 million ratings and 2.2 million captions, this dataset focuses on the task of humorous caption generation and evaluation.
提供机构:
The New Yorker



