data-archetype/hands_and_typography_1024
收藏Hugging Face2026-04-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/data-archetype/hands_and_typography_1024
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是由两个源数据集合并并重新标注而成:typography和hands,分别包含13,167和15,425个原始样本。合并后的数据集包含28,321个图像-文本样本,采用bucketed-shards格式存储,分辨率为1024系列。每个样本的标题存储在.txt文件中,并在shard元数据中记录为caption_gemini_2_5_flash。导出摘要显示,移除了271个完全重复的图像-标题对,没有空标题或源编解码错误。数据集是自包含的,包含图像、标题、每个样本的JSON元数据以及bucket/shard计数。
This dataset is a recaptioned merge of two source datasets: typography and hands, containing 13,167 and 15,425 source samples respectively. The merged dataset contains 28,321 image-text samples in bucketed-shards format at 1024-family bucket resolutions. Captions are stored in each sample .txt member and are recorded in the shard metadata as caption_gemini_2_5_flash. The export summary shows that 271 exact duplicate image-caption pairs were removed, with no empty captions or source decode/encode errors. The dataset is self-contained, including images, captions, per-sample JSON metadata, and bucket/shard counts.
提供机构:
data-archetype



