Exploring the Impact of Negative Sampling on Patent Citation Recommendation
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7870196
下载链接
链接失效反馈官方服务:
资源简介:
pcr_patents.csv is the dataset which is generated by collecting samples randomly from Google Patents by exploiting a Python library. The dataset comprises around 250,000 US patents and their titles, abstracts, and citations. Each patent has roughly on average 27 citations.
The zip file contains 3 different datasets for training and testing patent citation recommendation systems. These datasets were generated by utilizing the main dataset. They consist of around 1 million instances which are positive as well as negative samples.
pcr_cpc_negative_sample_data.csv consists of negative samples that were generated based on CPC subclass codes.
pcr_random_negative_sample_data.csv consists of negative samples that were generated randomly.
pcr_sem_sim_negative_sample_data_2.csv consists of negative samples that were generated based on nearest neighbor relation.
pcr_patents.csv 是通过调用Python库从谷歌专利(Google Patents)平台随机采集样本生成的数据集。该数据集包含约25万条美国专利及其标题、摘要与引用信息,单篇专利平均约含27条引用记录。
该压缩包包含3个用于训练与测试专利引用推荐系统的专用数据集,均基于上述主数据集生成,总计涵盖约100万条正负样本实例。
pcr_cpc_negative_sample_data.csv 包含基于CPC(合作专利分类,Cooperative Patent Classification)子类代码生成的负样本。
pcr_random_negative_sample_data.csv 包含随机生成的负样本。
pcr_sem_sim_negative_sample_data_2.csv 包含基于最近邻关系生成的负样本。
创建时间:
2023-04-27



