lifehacker777/cc3m-wds
收藏Hugging Face2025-12-12 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/lifehacker777/cc3m-wds
下载链接
链接失效反馈官方服务:
资源简介:
Conceptual Captions是一个包含约330万张图片及其对应标题的数据集。与其他经过精心策划的图像标题注释不同,Conceptual Captions的图片和原始描述是从网络上收集的,因此代表了更多样化的风格。具体来说,原始描述是从网页图片的Alt-text HTML属性中提取的。为了达到当前版本的标题,开发了一个自动管道,用于提取、过滤和转换候选图像/标题对,旨在实现结果标题的清洁性、信息量、流畅性和可学习性的平衡。
Conceptual Captions is a dataset consisting of ~3.3M images annotated with captions. In contrast with the curated style of other image caption annotations, Conceptual Caption images and their raw descriptions are harvested from the web, and therefore represent a wider variety of styles. More precisely, the raw descriptions are harvested from the Alt-text HTML attribute associated with web images. To arrive at the current version of the captions, we have developed an automatic pipeline that extracts, filters, and transforms candidate image/caption pairs, with the goal of achieving a balance of cleanliness, informativeness, fluency, and learnability of the resulting captions.
提供机构:
lifehacker777



