five

New Yorker Cartoon Caption Contest Dataset

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/yguooo/cartoon-caption-generation
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是一个多模态偏好数据集,包含了超过2.5亿个关于220万条标题的人类评分,这些数据是通过在过去八年中对《纽约客》周刊漫画标题竞赛的众包评分数据收集而来的。该数据集支持多模态大型语言模型的开发和基于偏好微调算法的评价,特别是在幽默标题生成方面。其规模之大,涵盖了超过2.5亿个评分和220万条标题,任务重点在于幽默标题的生成与评估。

This dataset is a multimodal preference dataset containing over 250 million human ratings for 2.2 million captions. The data were collected from crowdsourced ratings of The New Yorker Weekly Cartoon Caption Contest over the past eight years. It supports the development of multimodal large language models and the evaluation of preference-based fine-tuning algorithms, particularly for humorous caption generation. Boasting its large scale with over 250 million ratings and 2.2 million captions, this dataset focuses on the task of humorous caption generation and evaluation.
提供机构:
The New Yorker
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作