多模态商品摘要数据集JDSUM
收藏国家基础学科公共科学数据中心2024-03-05 收录
下载链接:
https://www.nbsdc.cn/general/dataDetail?id=64edc979bb16e07753c35b96&type=1
下载链接
链接失效反馈资源简介:
面向电商场景商品多模态展示,本项目构建了多模态商品描述生成数据集JDSUM,该数据集包含大约45万条样本,每个样本是一个<商品详细介绍文本, 商品图片,商品摘要>三元组,其中商品详细介绍文本和商品图片取自京东商城的商品数据,涉及家电、服饰、箱包三个类目,商品信息由商家上传,京东商城对其进行合规审核;商品摘要由数千名专家手动撰写生成,电商平台的审核组进行了严格审核,确保其质量合格。数据总量约97 GB,采集时间为2021年5月。
For multi-modal product display in e-commerce scenarios, this project constructs a multi-modal product description generation dataset named JDSUM. This dataset comprises approximately 450,000 samples, with each sample being a triplet of <product detailed description text, product images, product summary>. The product detailed description texts and product images are sourced from JD.com's product data, covering three categories: home appliances, apparel, and luggage & bags. All product information is uploaded by merchants and undergoes compliance review by JD.com. The product summaries are manually drafted by thousands of experts, and strictly reviewed by the JD.com review team to ensure their quality. The total size of the dataset is approximately 97 GB, and the data was collected in May 2021.
提供机构:
北京京东尚科信息技术有限公司
搜集汇总
数据集介绍

背景与挑战
背景概述
多模态商品摘要数据集JDSUM是一个面向电商场景的商品描述生成数据集,包含约45万条样本,每个样本由商品详细介绍文本、商品图片和人工撰写的商品摘要组成,覆盖家电、服饰、箱包三个类目。数据来源于京东商城,经过合规审核和专家严格把关,确保高质量,总数据量约97 GB,采集于2021年5月,适用于多模态自然语言处理研究。
以上内容由遇见数据集搜集并总结生成



