flickr30k
收藏魔搭社区2026-05-15 更新2024-10-12 收录
下载链接:
https://modelscope.cn/datasets/lmms-lab/flickr30k
下载链接
链接失效反馈官方服务:
资源简介:
<p align="center" width="100%">
<img src="https://i.postimg.cc/g0QRgMVv/WX20240228-113337-2x.png" width="100%" height="80%">
</p>
# Large-scale Multi-modality Models Evaluation Suite
> Accelerating the development of large-scale multi-modality models (LMMs) with `lmms-eval`
🏠 [Homepage](https://lmms-lab.github.io/) | 📚 [Documentation](docs/README.md) | 🤗 [Huggingface Datasets](https://huggingface.co/lmms-lab)
# This Dataset
This is a formatted version of [flickr30k](https://shannon.cs.illinois.edu/DenotationGraph/). It is used in our `lmms-eval` pipeline to allow for one-click evaluations of large multi-modality models.
```
@article{young-etal-2014-image,
title = "From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions",
author = "Young, Peter and
Lai, Alice and
Hodosh, Micah and
Hockenmaier, Julia",
editor = "Lin, Dekang and
Collins, Michael and
Lee, Lillian",
journal = "Transactions of the Association for Computational Linguistics",
volume = "2",
year = "2014",
address = "Cambridge, MA",
publisher = "MIT Press",
url = "https://aclanthology.org/Q14-1006",
doi = "10.1162/tacl_a_00166",
pages = "67--78",
abstract = "We propose to use the visual denotations of linguistic expressions (i.e. the set of images they describe) to define novel denotational similarity metrics, which we show to be at least as beneficial as distributional similarities for two tasks that require semantic inference. To compute these denotational similarities, we construct a denotation graph, i.e. a subsumption hierarchy over constituents and their denotations, based on a large corpus of 30K images and 150K descriptive captions.",
}
```
<p align="center" width="100%">
<img src="https://i.postimg.cc/g0QRgMVv/WX20240228-113337-2x.png" width="100%" height="80%">
</p>
# 大规模多模态模型评测套件(Large-scale Multi-modality Models Evaluation Suite)
> 借助`lmms-eval`加速大规模多模态模型(Large-scale Multi-modality Models, LMMs)的研发
🏠 [主页](https://lmms-lab.github.io/) | 📚 [文档](docs/README.md) | 🤗 [Huggingface 数据集](https://huggingface.co/lmms-lab)
# 本数据集
本数据集是[flickr30k](https://shannon.cs.illinois.edu/DenotationGraph/)的格式化版本,被应用于我们的`lmms-eval`评测流程中,可实现大规模多模态模型的一键式评测。
@article{young-etal-2014-image,
title = "From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions",
author = "Young, Peter and
Lai, Alice and
Hodosh, Micah and
Hockenmaier, Julia",
editor = "Lin, Dekang and
Collins, Michael and
Lee, Lillian",
journal = "《计算语言学协会汇刊》(Transactions of the Association for Computational Linguistics)",
volume = "2",
year = "2014",
address = "Cambridge, MA",
publisher = "麻省理工大学出版社(MIT Press)",
url = "https://aclanthology.org/Q14-1006",
doi = "10.1162/tacl_a_00166",
pages = "67--78",
abstract = "我们提出利用语言表达式的视觉指称(即其描述的图像集合)来定义全新的指称相似度指标,实验表明,针对两类需要语义推理的任务,该指标的效果至少不输于分布相似度。为计算此类指称相似度,我们基于包含3万张图像与15万条描述性字幕的大型语料库,构建了指称图——即针对句法成分及其指称的包含层级结构。",
}
提供机构:
maas
创建时间:
2024-10-06
搜集汇总
数据集介绍

背景与挑战
背景概述
flickr30k是一个包含30K图像和150K描述性标题的数据集,用于评估大规模多模态模型,支持通过视觉表示定义相似性度量以进行语义推理。
以上内容由遇见数据集搜集并总结生成



