umarigan/PD12M-Turkish
收藏Hugging Face2024-12-12 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/umarigan/PD12M-Turkish
下载链接
链接失效反馈官方服务:
资源简介:
PD12M Turkish数据集是一个包含图像及其相关元数据的大型数据集,主要用于问答任务。数据集包含12,249,454个训练样本,总大小为8,655,889,565字节。每个样本包含图像的文本描述、唯一标识符、URL、图像尺寸、MIME类型、哈希值、许可证和来源信息。数据集支持英语和土耳其语,是土耳其语中最大的文本到图像数据集之一。数据集以parquet文件格式提供,许可证为cdla-permissive-2.0。
The PD12M Turkish dataset is a large dataset containing images and their associated metadata, primarily used for question-answering tasks. The dataset includes 12,249,454 training samples with a total size of 8,655,889,565 bytes. Each sample contains the text description of the image, a unique identifier, URL, image dimensions, MIME type, hash, license, and source information. The dataset supports both English and Turkish languages and is one of the largest text-to-image datasets in Turkish. The dataset is provided in parquet file format and is licensed under cdla-permissive-2.0.
提供机构:
umarigan



