flickr8k

Hugging Face2026-03-25 更新2026-03-26 收录

下载链接：

https://huggingface.co/datasets/intro/flickr8k

下载链接

链接失效反馈

官方服务：

资源简介：

Flickr8k Captions With Splits 是一个图像-文本配对数据集，专为图像描述生成和文本生成任务设计。该数据集基于 Flickr8k 图像-文本语料库重新包装，采用 Hugging Face 的 `imagefolder` 布局，并包含每个分割的 `metadata.csv` 文件。数据集包含 8000 张图像，每张图像配有五个描述文本，分为训练集（6000 张）、开发集（1000 张）和测试集（1000 张）。数据集的主要字段包括：`image`（图像文件）、`file_name`（图像文件名）、`split`（数据集分割，如 train、dev、test）以及五个描述字段（`caption_0` 到 `caption_4`）。数据来源包括 `Flickr_8k.trainImages.txt`、`Flickr_8k.devImages.txt` 和 `Flickr_8k.testImages.txt` 的分割分配，以及 `Flickr8k.token.txt` 中的描述文本。数据集采用 CC0 许可证，属于公共领域。

Flickr8k Captions With Splits is an image-text paired dataset tailored for image captioning and text generation tasks. This dataset is repackaged based on the Flickr8k image-text corpus, adopts Hugging Face's `imagefolder` layout, and includes `metadata.csv` files for each data split. It contains 8000 images, each paired with five descriptive captions, and is divided into three subsets: training set (6000 images), development set (1000 images), and test set (1000 images). The core fields of the dataset include: `image` (image file), `file_name` (image filename), `split` (dataset split type, e.g., train, dev, test), and five caption fields (`caption_0` to `caption_4`). The dataset splits are assigned based on `Flickr_8k.trainImages.txt`, `Flickr_8k.devImages.txt` and `Flickr_8k.testImages.txt`, while the caption texts are sourced from `Flickr8k.token.txt`. The dataset is released under the CC0 license and is in the public domain.

创建时间：

2026-03-20

原始信息汇总

Flickr8k Captions With Splits 数据集概述

基本信息

数据集名称：Flickr8k Captions With Splits
托管地址：https://huggingface.co/datasets/intro/flickr8k
语言：英语
许可证：CC0-1.0（公共领域）
任务类别：图像到文本、文本生成
数据规模：1K<n<10K

数据集结构

数据集采用Hugging Face imagefolder布局进行重新打包，并包含按划分的metadata.csv文件。
包含三个数据划分文件夹：train/、dev/和test/。
每个文件夹包含图像文件和一个metadata.csv文件。
每一行代表一张图像及其对应的五个描述。

数据特征（列）

image：由Hugging Face Datasets库加载的图像文件。
file_name：图像文件名，存储为字符串，供imagefolder用于将数据行映射到图像文件。
split：数据划分，取值为train、dev或test。
caption_0 至 caption_4：图像对应的五个原始Flickr8k描述。

数据量统计

包含的图像总数：8000
训练集图像：6000
开发集图像：1000
测试集图像：1000
被排除的描述条目：92（因无数据划分或图像文件）

来源映射

数据划分分配来源于文件Flickr_8k.trainImages.txt、Flickr_8k.devImages.txt和Flickr_8k.testImages.txt。
描述来源于文件Flickr8k.token.txt，其中#0至#4分别对应caption_0至caption_4。

重要说明

原始描述文件包含8092个图像-描述组。
本Hugging Face就绪数据集仅包含8000张具有明确划分分配的图像。
被排除的数据行记录在excluded_rows.csv文件中。
缺失的本地图像文件记录在missing_image_rows.csv文件中。

加载示例

python from datasets import load_dataset dataset = load_dataset("intro/flickr8k") print(dataset["train"][0])

引用信息

如需使用本数据集，请引用： bibtex @article{hodosh2013framing, title={Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics}, author={Hodosh, Micah and Young, Peter and Hockenmaier, Julia}, journal={Journal of Artificial Intelligence Research}, volume={47}, pages={853--899}, year={2013}, url={http://www.jair.org/papers/paper3994.html} }