Internally Curated Dataset

Name: Internally Curated Dataset
Creator: Internally Curated
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://d-jepa.github.io/t2i

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了超过10亿张图像-文本对，用于训练D-JEPA⋅T2I模型。在整理过程中，我们排除了美学评分低于5.0的图像，并使用OCR工具过滤掉包含文本的图片。此外，为确保数据的真实性，合成数据集仅占整个数据集的大约5%。该数据集的规模超过10亿图像-文本对，主要用于高分辨率图像合成和文本到图像的生成任务。

This dataset contains over 1 billion image-text pairs for training the D-JEPA⋅T2I model. During the dataset curation stage, we excluded images with an aesthetic score below 5.0 and filtered out images containing text using OCR tools. Additionally, to ensure data authenticity, synthetic datasets only account for approximately 5% of the entire dataset. With over 1 billion image-text pairs in total, this dataset is primarily used for high-resolution image synthesis and text-to-image generation tasks.

提供机构：

Internally Curated

5,000+

优质数据集

54 个

任务类型

进入经典数据集