PLAtE
收藏arXiv2023-06-16 更新2024-06-21 收录
下载链接:
https://github.com/amazon-science/plate
下载链接
链接失效反馈官方服务:
资源简介:
PLAtE是由弗吉尼亚大学和亚马逊的研究团队创建的大规模数据集,专注于购物领域的多商品评论页面信息提取。该数据集包含52,898个商品,来自6,694个页面,总计156,014个属性,旨在解决神经模型在半结构化网站信息提取中的训练数据不足问题。数据集通过多阶段方法收集和标注,适用于评估和训练网页提取模型,特别是在产品列表分割和属性提取任务上。
PLAtE is a large-scale dataset developed by research teams from the University of Virginia and Amazon, focusing on information extraction from multi-product review pages in the e-commerce shopping domain. This dataset contains 52,898 products sourced from 6,694 web pages, with a total of 156,014 attributes, and aims to address the shortage of training data for neural models in information extraction from semi-structured websites. The dataset is collected and annotated via a multi-stage workflow, and is applicable for evaluating and training web information extraction models, especially for product list segmentation and attribute extraction tasks.
提供机构:
弗吉尼亚大学
创建时间:
2022-05-25



