five

CPC Patent Classification: USPTO-70k-enriched

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6992297
下载链接
链接失效反馈
官方服务:
资源简介:
The patent classification task falls under the category of hierarchical multi-label classification. A patent document contains `title`, `abstract`, `claims` and `description` as four textual fields. Because of the large text, most of the previous work focused on title, abstract and claims as patent fields. In the paper, we make use description as a more elaborate patent field. For evaluation, we create a new dataset (USPTO-70k-enriched) from the previously releasd USPTO-70k dataset which contains title and abstract as patent fields. Now, the dataset is enriched with four additional text columns, claims, brief-summary, fig-desc, detail-desc, where the later three columns are the subfield of description. Both the datasets are created from the bulk-data-dump provided by United States Patent and Trademark Office (USPTO) released under CC-BY-4.0. We also release the dataset under the same license, CC-BY-4.0.
创建时间:
2022-08-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作