Klarna Product Page Dataset
收藏arXiv2024-02-24 更新2024-06-21 收录
下载链接:
https://github.com/klarna/product-page-dataset
下载链接
链接失效反馈官方服务:
资源简介:
Klarna产品页面数据集是由瑞典皇家理工学院和Klarna公司共同创建的一个大规模、多样化的网页数据集,包含来自8175个电子商务网站的51,701个手动标注的产品页面。该数据集旨在模拟自主购物助手的场景,帮助提取跨多个网站的产品信息,并支持图神经网络和大型语言模型在网页元素提名任务中的研究。数据集不仅覆盖了多个地理区域,还包括了多种语言的文本元素,为网页自动化和计算机视觉研究提供了丰富的资源。
The Klarna Product Page Dataset is a large-scale, diverse web dataset jointly developed by the KTH Royal Institute of Technology and Klarna. It comprises 51,701 manually annotated product pages sourced from 8,175 e-commerce websites. This dataset is intended to simulate the scenario of autonomous shopping assistants, enabling the extraction of cross-website product information, and supporting research on web element nomination tasks utilizing Graph Neural Networks (GNNs) and Large Language Models (LLMs). Spanning multiple geographic regions and incorporating textual elements in diverse languages, the dataset provides a valuable resource for research in web automation and computer vision.
提供机构:
瑞典皇家理工学院
创建时间:
2021-11-03



