High-value Patent Identification Dataset
收藏DataCite Commons2025-04-27 更新2025-04-16 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=77691047acc34d189204772a8e1dabc6
下载链接
链接失效反馈官方服务:
资源简介:
The data is derived from Incopat, and is obtained by using the search formula "(AP-OR=(Chinese Academy of Sciences)) AND (AD=[20030101 TO 20221231])", with 10917 positive samples and 22652 negative samples. According to the IPC, there are three datasets, each divided into train, val, and test. Train is used for model training, val is used for model tuning, and test is used to evaluate model performance. The image data folder contains the abstract drawings of the patent.
本数据集源自Incopat数据库,通过检索式"(AP-OR=(中国科学院)) AND (AD=[20030101 TO 20221231])"获取得到,共计包含10917个正样本与22652个负样本。依据国际专利分类号(IPC),本数据集划分为三个子集,每个子集均进一步划分为训练集(train)、验证集(val)与测试集(test)。其中训练集用于模型训练,验证集用于模型调参,测试集用于评估模型性能。图像数据文件夹内存储有专利的摘要附图。
提供机构:
Science Data Bank
创建时间:
2024-09-23



