Product Datasets from the MWPD2020 Challenge at the ISWC2020 Conference (Task 1)

Name: Product Datasets from the MWPD2020 Challenge at the ISWC2020 Conference (Task 1)
Creator: Mannheim University Library
Published: 2024-06-21 20:12:04
License: 暂无描述

DataCite Commons2024-06-21 更新2024-07-13 收录

下载链接：

https://madata.bib.uni-mannheim.de/352

下载链接

链接失效反馈

官方服务：

资源简介：

The goal of Task 1 of the Mining the Web of Product Data Challenge (MWPD2020) was to compare the performance of methods for identifying offers for the same product from different e-shops. The datasets that are provided to the participants of the competition contain product offers from different e-shops in the form of binary product pairs (with corresponding label “match” or “no match”) from the product category computers. The data is available in the form of training, validation and test set for machine learning experiments. The Training set consists of ~70K product pairs which were automatically labeled using the weak supervision of marked up product identifiers on the web. The validation set contains 1.100 manually labeled pairs. The test set which was used for the evaluation of participating systems consists of 1500 manually labeled pairs. The test set is intentionally harder than the other sets due to containing more very hard matching cases as well as a variety of matching challenges for a subset of the pairs, e.g. products not having training data in the training set or products which have had typos introduced. These can be used to measure the performance of methods on these kinds of matching challenges. The data stems from the WDC Product Data Corpus for Large-Scale Product Matching - Version 2.0 which consists of 26 million product offers originating from 79 thousand websites, marking up their offers with schema.org vocabulary. For more information and download links for the corpus itself, please follow the links below.

提供机构：

Mannheim University Library

创建时间：

2021-01-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集