five

GLAMI-1M

收藏
arXiv2022-11-17 更新2024-06-21 收录
下载链接:
https://github.com/glami/glami-1m
下载链接
链接失效反馈
官方服务:
资源简介:
GLAMI-1M是由捷克共和国的GLAMI.cz和Rossum.ai创建的大型多语言图像-文本分类数据集,包含111万个记录,每个记录代表一个时尚产品,包括图像、名称和描述,描述使用13种语言之一。数据集用于图像-文本分类,具有191个类别,其中75%的训练集和所有测试集图像由人工标注。该数据集旨在解决电子商务平台中产品分类的问题,并适用于多语言图像-文本分类、文本条件图像生成和多语言机器翻译等研究领域。

GLAMI-1M is a large-scale multilingual image-text classification dataset created by GLAMI.cz and Rossum.ai from the Czech Republic. It comprises 1.11 million records, each corresponding to a fashion product with its associated image, product name and product description, where the description is provided in one of 13 languages. This dataset is designed for image-text classification tasks with 191 categories, and 75% of the training set images as well as all test set images are manually annotated. It aims to solve the problem of product categorization on e-commerce platforms, and is suitable for research fields including multilingual image-text classification, text-conditioned image generation, and multilingual machine translation.
提供机构:
捷克共和国
创建时间:
2022-11-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作