five

Korean_Product_Labels_Image_Dataset

收藏
魔搭社区2025-12-05 更新2025-11-08 收录
下载链接:
https://modelscope.cn/datasets/Kratos-AI/Korean_Product_Labels_Image_Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
# Korean Product Labels Image Dataset *This dataset contains a diverse collection of high-resolution images of Korean product labels and packaging. The dataset spans categories such as food, cosmetics, beverages, and household items, designed to support OCR, product classification, and visual-language AI research.* ## Contact For queries or collaborations related to this dataset, contact: - anoushka@kgen.io - abhishek.vadapalli@kgen.io ## Supported Tasks - **Task Categories**: - Image Classification - Text Recognition (OCR) - Product Detection - **Supported Tasks**: - Text extraction from Korean product labels and packaging - Product category classification (food, cosmetics, household goods, etc.) - Brand and ingredient recognition - Multilingual OCR for bilingual (Korean-English) labels - Visual-language model training for retail and e-commerce AI applications ## Languages - **Primary Language**: Korean - **Secondary Presence**: English (common in international or export products) ## Dataset Creation ### Curation Rationale This dataset was curated to help AI systems understand visual and textual content on Korean product packaging. It supports the development of OCR, product recognition, and multilingual retail automation technologies for smart shopping, cataloging, and consumer research. ### Source Data - **Contributors**: Field photographers and data collectors from retail stores, supermarkets, and e-commerce listings - **Collection Process**: Product labels were photographed or extracted from publicly available catalog images. All images were manually reviewed to remove barcodes, serial numbers, or any sensitive identifiable information. ### Other Known Limitations - **Bias**: Certain product categories (e.g., food and cosmetics) are more represented than industrial goods - **Lighting and Reflection**: Glossy packaging may introduce glare or partial occlusion in some images - **Language Mix**: Some packaging includes mixed Korean-English or Korean-Chinese text which may affect OCR accuracy ## Intended Uses ### ✅ Direct Use - Training OCR and product recognition models - Multimodal research in retail and consumer goods AI - Text understanding and translation in packaging contexts - Visual search and product identification for e-commerce applications ### ❌ Out-of-Scope Use - Commercial replication or rebranding of existing product labels - Extraction of proprietary design elements for commercial gain - Any use that violates intellectual property or packaging rights ## License CC BY 4.0

# 韩国商品标签图像数据集 *本数据集收录了多样的高分辨率韩国商品标签及包装图像,涵盖食品、化妆品、饮品及家居用品等品类,旨在为光学字符识别(Optical Character Recognition, OCR)、商品分类以及视觉语言人工智能研究提供支持。* ## 联系方式 若您对此数据集有相关咨询或合作意向,请联系: - anoushka@kgen.io - abhishek.vadapalli@kgen.io ## 支持任务 - **任务类别**: - 图像分类 - 文本识别(光学字符识别,OCR) - 商品检测 - **支持任务**: - 从韩国商品标签及包装中提取文本 - 商品品类分类(食品、化妆品、家居用品等) - 品牌与成分识别 - 韩英双语标签的多语言光学字符识别 - 面向零售与电商人工智能应用的视觉语言模型训练 ## 语言情况 - **主要语言**:韩语 - **次要语言分布**:英语(常见于国际或出口商品包装) ## 数据集构建 ### 构建依据 本数据集旨在助力人工智能系统理解韩国商品包装上的视觉与文本内容,可为光学字符识别、商品识别以及面向智能购物、商品编目与消费者研究的多语言零售自动化技术研发提供支撑。 ### 源数据 - **贡献方**:来自线下零售店、超市及电商平台的外景摄影师与数据采集人员 - **采集流程**:商品标签图像通过拍摄或从公开目录图像中提取获得,所有图像均经人工审核,移除了条形码、序列号及任何敏感可识别信息。 ### 已知局限性 - **样本偏倚**:部分商品品类(如食品与化妆品)的样本占比高于工业用品 - **光照与反光问题**:部分图像中,光泽包装可能产生眩光或局部遮挡 - **语言混合问题**:部分包装包含韩英或韩中混合文本,可能对光学字符识别的准确率造成影响 ## 预期用途 ### ✅ 直接适用场景 - 训练光学字符识别与商品识别模型 - 开展零售与消费品人工智能领域的多模态研究 - 实现包装场景下的文本理解与翻译 - 面向电商应用的视觉搜索与商品识别 ### ❌ 不适用场景 - 商业复刻或重新包装现有商品标签 - 为商业牟利提取商品包装的专有设计元素 - 任何侵犯知识产权或包装权益的使用行为 ## 授权协议 CC BY 4.0
提供机构:
maas
创建时间:
2025-10-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作