five

lakritidis/product-matching

收藏
Hugging Face2024-05-09 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/lakritidis/product-matching
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: gpl task_categories: - text-classification language: - en size_categories: - n<1K --- ### Dataset Description This repository offers an ideal ground for evaluating product matching algorithms and clustering/classification models. The dataset contains e-commerce data; that is, product IDs, their titles, and their corresponding category. However, they can easily be applied to any problem which involves text/short-text mining. The data originates from PriceRunner, a popular product comparison platform. It includes 35,311 products from 10 categories, provided by 306 different vendors. It has been collected by a special focused Web crawler which has been developed for this purpose. - **Curated by:** Leonidas Akritidis - **Language(s) (NLP):** English - **License:** GPL 2.0 ## Uses Product matching, classification, clustering ### Direct Use Electronic stores, Product comparison platforms, Price comparison applications, e-Commerce systems ## Dataset Structure The CSV file comprises 7 columns: 1. Product ID: The ID of the product 2. Product Title: The title of the product, as it was provided by the vendor/e-shop. 3. Vendor ID: The e-shop/vendor that sells this product. 4. Cluster ID: All products having identical cluster IDs correspond to the same product entity. For example, the first 23 records correspond to the same product. Also useful for clustering algorithms. 5. Cluster Label: The product title as it was provided by the product comparison platform (PRocerunner or Skroutz). 6. Category ID: Useful for classification models. It represents the class (category) of a product. 7. Category Label: The class label. ## Citations Researchers are kindly requested to include the following articles in their paper/s: 1. L. Akritidis, A. Fevgas, P. Bozanis, C. Makris, "A Self-Verifying Clustering Approach to Unsupervised Matching of Product Titles", Artificial Intelligence Review (Springer), pp. 1-44, 2020. 2. L. Akritidis, P. Bozanis, "Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations", In Proceedings of the 14th IEEE International Conference on Innovations in Intelligent Systems and Applications (INISTA), pp. 1-10, 2018. 3. L. Akritidis, A. Fevgas, P. Bozanis, "Effective Product Categorization with Importance Scores and Morphological Analysis of the Titles", In Proceedings of the 30th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 213-220, 2018. ## Contact Leonidas Akritidis, lakritidis@ihu.gr
提供机构:
lakritidis
原始信息汇总

数据集概述

数据集名称: 未提供具体名称
许可证: GPL 2.0
任务类别: 文本分类
语言: 英语
数据集大小: 小于1K

数据来源: 来自PriceRunner,一个流行的产品比较平台。
数据内容: 包含35,311个产品,涉及10个类别,由306个不同供应商提供。
数据结构: 包含7个字段:产品ID、产品标题、供应商ID、集群ID、集群标签、类别ID、类别标签。

主要用途: 产品匹配、分类、聚类。
直接应用领域: 电子商店、产品比较平台、价格比较应用、电子商务系统。

数据集维护者: Leonidas Akritidis
联系方式: lakritidis@ihu.gr

引用文献:

  1. L. Akritidis, A. Fevgas, P. Bozanis, C. Makris, "A Self-Verifying Clustering Approach to Unsupervised Matching of Product Titles", Artificial Intelligence Review (Springer), pp. 1-44, 2020.
  2. L. Akritidis, P. Bozanis, "Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations", In Proceedings of the 14th IEEE International Conference on Innovations in Intelligent Systems and Applications (INISTA), pp. 1-10, 2018.
  3. L. Akritidis, A. Fevgas, P. Bozanis, "Effective Product Categorization with Importance Scores and Morphological Analysis of the Titles", In Proceedings of the 30th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 213-220, 2018.
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作