Python elaboration of Patstat and Orbit data

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://data.mendeley.com/datasets/gfnhp8r52y

下载链接

链接失效反馈

官方服务：

资源简介：

The files included in this repository refer to an example of a patent intelligence approach suitable to focus on the technical implementations concerning the geographic mapping of the marine environment. Having run a search query on the database Orbit Intelligence (Questel), 315 records have been downloaded, the priority years ranging from yr. 2015 to October 2020. The files included allow to reproduce the results of a methodology conceived in order to implement the information available from a typical patent search approach. One goal, dealing on the one hand with the association of a list including technical concepts described in a coincise style with each corresponding patent family belonging to the original dataset, and on the other hand dealing with the necessity of producing a list of keywords ranked according to their frequency of use within the titles of the whole dataset, can be accomplished. To such aim, a quite simple elaboration can be performed with a couple of IPython Notebooks, allowing to generate MS Excel files included in such repository. Such modification of the original dataset allows the user to implement the accessibility to the technical details of the patent data, as well as optionally import the modified data as MS Excel files into the MS Power BI app to achieve a dynamic layout by means of tables and charts allowing to focus on specific features. For example, it is possible to filter the patent documents dealing with a specific technical concept, rather than those including a specific keyword in their titles. An additional goal is aimed at zooming in on a restricted number of patent documents selectable from the original dataset, being there the possibility of clustering the patents based on patterns of either IPC classification codes or on one or more of the 35 technology fields defined by WIPO. Specific patterns based on IPC classification codes or technology fields will appear as distinct clusters of patent documents thanks to the dimensionality reduction allowed by the t-SNE algorithm, which represents a significant implementation with respect to the traditional and less sophisticated approach, being the unambiguous association of a given pattern of IPC classification codes with a well focused, or even unique, technical topic a critical issue. According to the methodology proposed, sub-pools of patent families may be quickly partitioned into macro-categories based on the technical content of each patent document. At the same time, a quantitative analysis of the most representative IPC classification codes or of the predominant technology fields is immediately achieved. The outcome of such clustering approach consists of two pdf files (included in the repository). The patent family identifiers can be immediately detected within each cluster, thanks to the search tool available in Adobe Acrobat, so that the respective bibliographic data may be soon retrieved using such patent family identifier as input of Patstat.

创建时间：

2021-03-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集