zhili-liu/implicit-concept-dataset
收藏Hugging Face2024-07-16 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/zhili-liu/implicit-concept-dataset
下载链接
链接失效反馈官方服务:
资源简介:
Implicit Concept Dataset (ICD) 是一个包含三种典型隐式概念(二维码、水印和文本)的图像-文本数据集,旨在反映现实生活中隐式概念容易被注入的场景。ICD-QR子数据集包含802个图像-文本对,其中80%用于微调,20%用于测试。训练子集中25%的图像粘贴了二维码,二维码的长度从图像长度的1/4到1/2不等,位置随机,偶尔与原始内容重叠以模拟真实场景。测试图像则不含二维码。ICD-Text子数据集使用了LAION-Glyph-1M提供的训练数据,包含100万个样本,每个图像都包含文本。评估数据集则从LAION收集了2000张无文本图像。
The Implicit Concept Dataset (ICD) is an image-text dataset containing three typical implicit concepts (i.e., QR codes, watermarks, and text), designed to reflect real-life situations where implicit concepts are easily injected. The dataset is divided into two subsets: ICD-QR and ICD-Text. ICD-QR contains 802 image-text pairs, with 80% used for fine-tuning and 20% for testing. In the training subset, QR codes are pasted on 25% of the images, with QR code lengths varying from 1/4 to 1/2 of the image length, placed randomly, and occasionally overlapping with the original content to mimic real-world scenarios. Test images are QR code-free for evaluation. The ICD-Text dataset is provided by LAION-Glyph-1M, containing 1M samples with each image containing text. The evaluation dataset collects an additional 2k text-free images from LAION.
提供机构:
zhili-liu



