Amazon-Google
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/Amazon-Google
下载链接
链接失效反馈官方服务:
资源简介:
用于实体解析的 Amazon-Google 数据集源自在线零售商 Amazon.com 和可通过 Google Base Data API 访问的 Google 产品搜索服务。该数据集包含来自 amazon.com 的 1363 个实体和 3226 个 google 产品,以及两个数据源之间具有 1300 个匹配记录对的黄金标准(完美映射)。两个数据源之间的共同属性是:产品名称、产品描述、制造商和价格。该数据集最初发布在莱比锡大学数据库组的存储库中:https://dbs.uni-leipzig.de/research/projects/object_matching/benchmark_datasets_for_entity_resolution 以实现结果的可重复性和性能的可比性在亚马逊-谷歌匹配任务上的不同匹配器中,数据集被分成固定的训练集、验证集和测试集。 CompERBench 存储库中提供了固定拆分:http://data.dws.informatik.uni-mannheim.de/benchmarkmatchingtasks/index.html
The Amazon-Google dataset for entity resolution is sourced from the online retailer Amazon.com and the Google Product Search service accessible via the Google Base Data API. This dataset contains 1,363 entities from amazon.com and 3,226 Google products, along with a gold standard (perfect mapping) of 1,300 matching record pairs between the two data sources. The shared attributes between the two data sources are: product name, product description, manufacturer, and price. This dataset was originally published in the repository of the Database Group at Leipzig University: https://dbs.uni-leipzig.de/research/projects/object_matching/benchmark_datasets_for_entity_resolution to enable result reproducibility and performance comparability across different matchers for the Amazon-Google matching task. The dataset is split into fixed training, validation, and test sets. The fixed splits are provided in the CompERBench repository: http://data.dws.informatik.uni-mannheim.de/benchmarkmatchingtasks/index.html
提供机构:
OpenDataLab
创建时间:
2022-05-23
搜集汇总
数据集介绍

背景与挑战
背景概述
Amazon-Google数据集是一个用于实体解析的基准数据集,包含来自Amazon和Google的产品数据及匹配记录,支持匹配算法的性能比较。
以上内容由遇见数据集搜集并总结生成



