CPC Patent Classification: USPTO-70k-enriched
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6992297
下载链接
链接失效反馈官方服务:
资源简介:
The patent classification task falls under the category of hierarchical multi-label classification. A patent document contains `title`, `abstract`, `claims` and `description` as four textual fields. Because of the large text, most of the previous work focused on title, abstract and claims as patent fields. In the paper, we make use description as a more elaborate patent field. For evaluation, we create a new dataset (USPTO-70k-enriched) from the previously releasd USPTO-70k dataset which contains title and abstract as patent fields.
Now, the dataset is enriched with four additional text columns, claims, brief-summary, fig-desc, detail-desc, where the later three columns are the subfield of description. Both the datasets are created from the bulk-data-dump provided by United States Patent and Trademark Office (USPTO) released under CC-BY-4.0.
We also release the dataset under the same license, CC-BY-4.0.
创建时间:
2022-08-15



