five

Exploring the Impact of Negative Sampling on Patent Citation Recommendation

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7870196
下载链接
链接失效反馈
官方服务:
资源简介:
pcr_patents.csv is the dataset which is generated by collecting samples randomly from Google Patents by exploiting a Python library. The dataset comprises around 250,000 US patents and their titles, abstracts, and citations.  Each patent has roughly on average 27 citations. The zip file contains 3 different datasets for training and testing patent citation recommendation systems. These datasets were generated by utilizing the main dataset. They consist of around 1 million instances which are positive as well as negative samples.   pcr_cpc_negative_sample_data.csv  consists of negative samples that were generated based on CPC subclass codes.  pcr_random_negative_sample_data.csv consists of negative samples that were generated randomly.  pcr_sem_sim_negative_sample_data_2.csv consists of negative samples that were generated based on nearest neighbor relation.

pcr_patents.csv 是通过调用Python库从谷歌专利(Google Patents)平台随机采集样本生成的数据集。该数据集包含约25万条美国专利及其标题、摘要与引用信息,单篇专利平均约含27条引用记录。 该压缩包包含3个用于训练与测试专利引用推荐系统的专用数据集,均基于上述主数据集生成,总计涵盖约100万条正负样本实例。 pcr_cpc_negative_sample_data.csv 包含基于CPC(合作专利分类,Cooperative Patent Classification)子类代码生成的负样本。 pcr_random_negative_sample_data.csv 包含随机生成的负样本。 pcr_sem_sim_negative_sample_data_2.csv 包含基于最近邻关系生成的负样本。
创建时间:
2023-04-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作