ProductMatch.pl
收藏arXiv2022-06-01 更新2024-06-21 收录
下载链接:
https://github.com/grant-TraDA/mlt4pm
下载链接
链接失效反馈官方服务:
资源简介:
ProductMatch.pl是首个针对波兰语的产品匹配任务开放数据集,由华沙理工大学数学与信息科学学院创建。该数据集基于从多个在线商店收集的特定类别产品报价,包含约3600个正样本对。创建过程中,数据经过清洗和格式化,以符合Web Data Commons数据集标准。该数据集旨在为波兰语环境下的产品匹配问题提供基准,支持比较预训练模型的有效性,并推动多语言Transformer模型在非英语市场的应用。
ProductMatch.pl is the first open dataset for Polish-language product matching tasks, created by the Faculty of Mathematics and Information Science at Warsaw University of Technology. This dataset is based on product listings from specific categories collected from multiple online stores, containing approximately 3,600 positive sample pairs. During its development, the data was cleaned and formatted to comply with the standards of the Web Data Commons dataset. This dataset aims to provide a benchmark for product matching tasks in the Polish-language context, support the comparison of pretrained model effectiveness, and promote the application of multilingual Transformer models in non-English markets.
提供机构:
华沙理工大学数学与信息科学学院
创建时间:
2022-05-31



