Analysis of IPC classification codes frequency in patents concerning "in situ" remediation technologies
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://data.mendeley.com/datasets/gk24h42jty
下载链接
链接失效反馈官方服务:
资源简介:
The patent dataset analysed is based on search criteria aimed at retrieving patent documents dealing with "in situ" remediation technologies. The dataset has been created in the context of the Horizon2020 funded project "Posidon" (https://www.posidonproject.eu/). According to the European Environment Information and Observation Network for soil (EIONET-SOIL), the number of estimated potential soil contaminated sites is more than 2.5 million , of which about 14 % (340 000 sites) are highly likely to be contaminated, and hence in need of remediation measures. In terms of budget, the management of contaminated sites is estimated to cost around 6 billion Euros (€) annually. The aim of the project is to foster the development of innovative technical solutions through pre-commercial procurement selection procedures. The initial elucidation of the prior art, based on an extensive analysis of patent documents is fundamental. As Patlib centre staff members, also enrolled in the "monitoring board" of Posidon, we produce evidence that there is a considerable amount of predivulgation of decontamination technologies applicable for "in situ" reclamation of contaminated soil and/or water emerging from patent documents. Since we are especially interested in identifying the trends of the technologies that score the highest frequency of citation within the patent dataset, we illustrate one way of "unpacking" the patent dataset by identifying recurrent patterns of IPC classification codes. To this purpose, the IPC classification codes characteristic of each patent family of the dataset are analysed by isolating and clustering through subsequent stages the patent documents sharing specific IPC subgroups, main groups and subclasses patterns. During each phase the t-distributed stochastic neighbor embedding (tSNE) algorithm is applied to an array of patent families depending on presence/absence of IPC subgroups or main groups or subclasses, chosen among those most frequent in the dataset. Therefore, following the first round of clustering, those patent documents sharing specific IPC subgroups patterns are isolated and ready for additional investigation. The remaining patent documents undergo the second analytic phase by means of tSNE, therefore those patent documents sharing specific IPC main groups patterns are isolated and ready for additional investigation. The remaining patent documents undergo the final clustering by means of tSNE in order to separate the patent documents depending on specific patterns of IPC subclasses. By means of this procedure about 90% of the initial dataset (1632 simple families - as defined by the European Patent Office) become "unpacked" and clustered. Further assessments based on the patent bibliographic data can be performed, the essential advantages being that the technical content of each cluster is homogeneous and the results of different clusters can be subsequently aggregated, when a specific IPC code is common to such clusters.
创建时间:
2022-03-01



