Shopify/product-catalogue
收藏Hugging Face2025-12-12 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/Shopify/product-catalogue
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个大规模多模态基准数据集,用于产品分类法分类,包含真实电子商务产品的图像、描述和分层类别标签。每个样本包括产品图像、标题、描述、品牌和Shopify产品分类法中的真实类别。数据集适用于评估视觉语言模型在真实世界产品分类中的表现,以及在电子商务环境中的多模态理解能力。数据集统计信息显示,总样本量为48,289,独特类别为10,476,独特品牌为28,913,92.9%的产品有描述,98.2%的产品有品牌信息。类别深度平均为4.5级,范围在1-8级之间。数据集分为训练集(38,631样本,80%)和测试集(9,658样本,20%)。
The Catalogue is a large-scale, multimodal benchmark dataset for product taxonomy classification, featuring real e-commerce products with images, descriptions, and hierarchical category labels. Each sample includes a product image, title, description, brand, and the ground-truth category from Shopifys product taxonomy. The dataset is ideal for evaluating vision-language models on real-world product classification and benchmarking multimodal understanding in e-commerce contexts. Dataset statistics show a total of 48,289 samples, 10,476 unique categories, 28,913 unique brands, 92.9% of products with descriptions, and 98.2% with brand information. The average category depth is 4.5 levels, ranging from 1-8 levels. The dataset is split into train (38,631 samples, 80%) and test (9,658 samples, 20%) sets.
提供机构:
Shopify



