Multimodal Dataset for Detecting AI-Generated Deceptive Fashion Product Representations on Pinterest

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://data.mendeley.com/datasets/ydmpkd52kr

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset contains a curated collection of fashion gown product listings collected from Pinterest, designed to support research on AI-generated content, deceptive product representations, and multimodal signaling in visually mediated electronic markets. The dataset integrates visual, textual, and contextual features for each listing, enabling joint analysis of image aesthetics, linguistic framing, and engagement-based platform dynamics. The data were collected from publicly accessible Pinterest pins linking to gown product listings during a defined sampling period. Listings were identified using keyword-based and category-based queries related to formal gowns and fashion apparel, and were restricted to content intended for commercial or promotional purposes. Only publicly visible pins and associated metadata were collected; no private user data were accessed. Each observation corresponds to a unique product listing and includes one or more associated images, descriptive text, and engagement metrics (e.g., saves, likes). Listings were subsequently labeled as AI-generated deceptive or non-deceptive (human-created or verifiable) based on a multi-stage verification procedure combining visual plausibility assessment, textual grounding analysis, and external validation of product existence or seller credibility. Ambiguous cases were excluded to preserve label reliability. Data structure and variables For each listing, the dataset includes: Visual features extracted from product images, capturing aesthetic complexity, texture regularity, and structural consistency relevant to fashion imagery Textual features derived from listing descriptions, including measures of abstraction, affective language, and informational specificity Contextual and control variables, such as price, seller tenure (where available), and basic listing metadata Engagement indicators, reflecting platform-level user interaction with the listing Binary labels indicating whether the listing was classified as AI-generated deceptive or non-deceptive

本数据集收录了从Pinterest平台采集的、经精心筛选整理的时尚礼服商品列表，旨在支撑AI生成内容、欺骗性商品展示以及视觉介导电子市场（visually mediated electronic markets）中多模态信号传递（multimodal signaling）的相关研究。本数据集整合了每条商品列表的视觉、文本与上下文特征，可实现对图像美学、语言框架（linguistic framing）以及基于用户互动的平台动态的联合分析。数据采集自特定采样周期内，所有链接指向礼服商品列表的公开可访问Pinterest钉帖（pin）。商品列表通过针对正式礼服与时尚服饰的关键词及分类查询进行识别，且仅纳入用于商业或推广目的的内容。本次采集仅获取公开可见的钉帖及其关联元数据，未访问任何私人用户数据。每条观测样本对应唯一的商品列表，包含一张或多张关联图像、描述性文本以及互动指标（engagement metrics，如收藏、点赞）。后续通过多阶段验证流程，结合视觉合理性评估（visual plausibility assessment）、文本溯源分析（textual grounding analysis）以及商品真实性或卖家可信度（seller credibility）的外部验证（external validation），将商品列表标记为AI生成欺骗性或非欺骗性（人类创作或可验证）内容。为保障标签可靠性，所有模棱两可的样本均已排除。 ### 数据结构与变量针对每条商品列表，数据集包含以下内容： 1. 从商品图像中提取的视觉特征，涵盖与时尚图像相关的美学复杂度（aesthetic complexity）、纹理规整度（texture regularity）与结构一致性（structural consistency）； 2. 源自商品列表描述文本的特征，包括抽象性、情感性语言（affective language）以及信息特异性（informational specificity）等度量维度； 3. 上下文与控制变量（control variables），例如价格、卖家入驻时长（如可获取）以及基础列表元数据； 4. 互动指标，用于反映平台用户与该商品列表的交互情况； 5. 二元标签（binary labels），用于指示该商品列表是否被归类为AI生成欺骗性内容。

创建时间：

2026-01-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集