中文商品知识图谱数据集
收藏国家基础学科公共科学数据中心2024-03-05 收录
下载链接:
https://www.nbsdc.cn/general/dataDetail?id=64ef84b0bb16e0591d02535d&type=1
下载链接
链接失效反馈官方服务:
资源简介:
为研究非结构化数据汇聚方法,课题构建中文商品知识图谱数据集,支持研究自动化构建知识图谱系统,接收电商平台商品页面作为外部数据源数据,历经数据处理、知识加工、知识管理和认知服务,最终构建成商品知识图谱,为各种场景下的应用提供认知服务能力。数据处理层接收原始数据作为输入,经过数据处理形成高质量数据。高质量数据进入知识加工层,经过各种知识加工工序形成高质量知识图谱。数据集通过对小家电与服鞋品类商品共30205种产品网页信息进行抽取,包含产品类别共231种,提取关键商品类别与属性信息,构建出统一表示形式的商品知识图谱共包含有759种商品属性,构建出商品、属性、属性值三元组共404657组。
To study unstructured data aggregation methods, this research constructs a Chinese commodity knowledge graph dataset to support research on automated knowledge graph construction systems. It takes product web pages from e-commerce platforms as external data sources, and after undergoing data processing, knowledge refinement, knowledge management and cognitive service workflows, finally forms a commodity knowledge graph that provides cognitive service capabilities for applications across diverse scenarios.
The data processing layer takes raw data as input, and processes it into high-quality curated data. The high-quality curated data is then fed into the knowledge refinement layer, where it undergoes a series of knowledge processing procedures to generate a high-quality knowledge graph.
This dataset is extracted from the web page information of 30,205 products across two categories: small household appliances and apparel & footwear. It encompasses 231 product categories in total, extracts key commodity categories and attribute information, constructs a uniformly represented commodity knowledge graph containing 759 commodity attributes, and builds a total of 404,657 triples consisting of commodities, attributes and attribute values.
提供机构:
北京京东尚科信息技术有限公司
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个中文商品知识图谱数据集,旨在支持自动化知识图谱系统的研究,基于电商平台商品页面数据构建。它覆盖小家电与服鞋品类的30205种产品,包含231种产品类别和759种商品属性,形成了404657组三元组知识表示,数据量为23.17MB。数据集由北京京东尚科信息技术有限公司创建,属于国家重点研发计划项目,适用于自然语言处理等领域的应用。
以上内容由遇见数据集搜集并总结生成



