Harvard USPTO Dataset (HUPD)
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/ZoeYou/PatentEval
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个全面的数据集,包含了2004年1月至2018年12月期间提交给美国专利商标局(USPTO)的英文实用专利申请。它精心挑选了已授权的高质量专利文本。此外,该数据集不仅包括已授权和被拒绝的专利申请,为了确保评估数据的质量,在评估数据集中仅包含了已授权的专利。注释工作由一位专利律师和一位博士研究生共同完成。数据集的规模为400个已授权专利的样本,这些样本在八个主要国际专利分类(IPC)部分中均衡分布。该数据集的任务包括从权利要求生成摘要以及下一项权利要求的生成。
This comprehensive dataset encompasses English utility patent applications submitted to the United States Patent and Trademark Office (USPTO) from January 2004 to December 2018. It curates high-quality granted patent documents. Additionally, the dataset includes both granted and rejected patent applications; however, to ensure the quality of the evaluation data, only granted patents are included in the evaluation subset. The annotation process was jointly conducted by a patent attorney and a doctoral candidate. The dataset consists of 400 granted patent samples, which are evenly distributed across eight main International Patent Classification (IPC) sections. The tasks supported by this dataset include generating abstracts from patent claims and generating subsequent patent claims.
提供机构:
Harvard University
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



