Spawning/PD3M
收藏Hugging Face2024-11-19 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Spawning/PD3M
下载链接
链接失效反馈官方服务:
资源简介:
PD3M数据集是PD12M数据集的一个子集,包含330万张图像-标题对,这些图像具有最高的美学评分。PD12M是目前最大的公共领域图像-文本数据集,适合训练基础模型,同时最小化版权问题。数据集包含两个主要部分:元数据和图像。元数据通过一系列parquet文件提供,包含图像的URL、标题、尺寸、嵌入等信息。图像文件则托管在AWS S3存储桶中。
PD3M is a subset of PD12M, containing 3.3 million image-caption pairs with the highest aesthetic scores. PD12M is the largest public domain image-text dataset to date, suitable for training foundation models while minimizing copyright concerns. The dataset consists of two main components: metadata and images. Metadata is provided through a series of parquet files, containing image URLs, captions, dimensions, embeddings, etc. Image files are hosted in an AWS S3 bucket.
提供机构:
Spawning



