five

askoepke/wit_1m_recaptioned

收藏
Hugging Face2026-04-29 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/askoepke/wit_1m_recaptioned
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是从Wikipedia-based Image Text (WIT)中提取的图像-文本数据集,包含原始标题和Gemini生成的标题。数据集分为两个配置:wit_1024包含1,024个用于对齐评估的查询样本,wit_1m包含1,000,000个从WIT中抽取的样本,去除了重复的感知哈希和标题文本,并排除了1,024个查询样本。数据集列包括图像(JPEG格式)、原始标题(来自WIT)、源图像URL(Wikimedia Commons)和Gemini生成的约500字描述。所有图像均使用`gemini-3-flash-preview`模型生成标题,提示语要求生成详细的、仅基于视觉事实的描述。数据来源为Wikipedia-based Image Text (WIT)数据集(Srinivasan et al., 2021)。

Image–text dataset derived from Wikipedia-based Image Text (WIT) with original and Gemini-generated captions. The dataset includes two configs: wit_1024, a fixed set of 1,024 query samples used for alignment evaluation, and wit_1m, a gallery of 1,000,000 samples drawn from WIT, deduplicated by perceptual hash and caption text, with the 1,024 query samples excluded. Columns include image (JPEG), original_caption (from WIT), url (Wikimedia Commons), and gemini_caption (~500-word description). All images were recaptioned using `gemini-3-flash-preview` with a prompt to generate detailed, factual descriptions. Source: Wikipedia-based Image Text (WIT) dataset (Srinivasan et al., 2021).
提供机构:
askoepke
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作