five

lifehacker777/cc3m-wds

收藏
Hugging Face2025-12-12 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/lifehacker777/cc3m-wds
下载链接
链接失效反馈
官方服务:
资源简介:
Conceptual Captions是一个包含约330万张图片及其对应标题的数据集。与其他经过精心策划的图像标题注释不同,Conceptual Captions的图片和原始描述是从网络上收集的,因此代表了更多样化的风格。具体来说,原始描述是从网页图片的Alt-text HTML属性中提取的。为了达到当前版本的标题,开发了一个自动管道,用于提取、过滤和转换候选图像/标题对,旨在实现结果标题的清洁性、信息量、流畅性和可学习性的平衡。

Conceptual Captions is a dataset consisting of ~3.3M images annotated with captions. In contrast with the curated style of other image caption annotations, Conceptual Caption images and their raw descriptions are harvested from the web, and therefore represent a wider variety of styles. More precisely, the raw descriptions are harvested from the Alt-text HTML attribute associated with web images. To arrive at the current version of the captions, we have developed an automatic pipeline that extracts, filters, and transforms candidate image/caption pairs, with the goal of achieving a balance of cleanliness, informativeness, fluency, and learnability of the resulting captions.
提供机构:
lifehacker777
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作