zhili-liu/implicit-concept-dataset

Name: zhili-liu/implicit-concept-dataset
Creator: zhili-liu
Published: 2024-07-16 10:06:34
License: 暂无描述

Hugging Face2024-07-16 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/zhili-liu/implicit-concept-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

Implicit Concept Dataset (ICD) 是一个包含三种典型隐式概念（二维码、水印和文本）的图像-文本数据集，旨在反映现实生活中隐式概念容易被注入的场景。ICD-QR子数据集包含802个图像-文本对，其中80%用于微调，20%用于测试。训练子集中25%的图像粘贴了二维码，二维码的长度从图像长度的1/4到1/2不等，位置随机，偶尔与原始内容重叠以模拟真实场景。测试图像则不含二维码。ICD-Text子数据集使用了LAION-Glyph-1M提供的训练数据，包含100万个样本，每个图像都包含文本。评估数据集则从LAION收集了2000张无文本图像。

The Implicit Concept Dataset (ICD) is an image-text dataset containing three typical implicit concepts (i.e., QR codes, watermarks, and text), designed to reflect real-life situations where implicit concepts are easily injected. The dataset is divided into two subsets: ICD-QR and ICD-Text. ICD-QR contains 802 image-text pairs, with 80% used for fine-tuning and 20% for testing. In the training subset, QR codes are pasted on 25% of the images, with QR code lengths varying from 1/4 to 1/2 of the image length, placed randomly, and occasionally overlapping with the original content to mimic real-world scenarios. Test images are QR code-free for evaluation. The ICD-Text dataset is provided by LAION-Glyph-1M, containing 1M samples with each image containing text. The evaluation dataset collects an additional 2k text-free images from LAION.

提供机构：

zhili-liu

5,000+

优质数据集

54 个

任务类型

进入经典数据集