five

氢能产业链创新要素数据

收藏
浙江省数据知识产权登记平台2025-09-05 更新2025-09-06 收录
下载链接:
https://www.zjip.org.cn/home/announce/trends/175434
下载链接
链接失效反馈
官方服务:
资源简介:
通过采集、标引及分析获得产业链标签统计数据及专利、企业、人才等创新要素数据并按月更新,构建起覆盖全产业链的动态数据网络,可为政府部门产业规划、招商引智、企业培育等决策制定提供有价值的参考,为企业研发创新立项、竞争对手分析提供数据支撑。 政府部门:通过“统计数据”掌握全球、中国产业链各层级专利技术、创新企业、发明人才的发展现状和创新热点方向,分析出创新活跃、企业集聚度高和目前尚存在空白或不足的产业链环节,从而在制定产业规划时针对性地加大对该环节的扶持力度;通过“企业数据”和“人才数据”,根据专利公开量和发明公开量精准定位具有创新实力的企业和高端人才,针对性地进行招商引智和企业培育。 企业:通过“统计数据”了解整个产业的专利申请情况,发现行业中技术创新活跃细分领域,关注该领域的新技术,从而结合自身产品特点有针对性地进行升级;通过“企业数据”对比自身与同行业其他企业的专利数量等指标,了解自身在行业中的地位,分析自己的优势和不足;通过“专利数据”分析不同企业的专利技术,掌握竞争对手的研发方向和创新成果,在研发创新立项时避免重复研发,找准差异化竞争方向;利用“人才数据”发现潜在的优秀人才。产业链数据为按月更新。样例数据统计范围:截至2024年12月。 1.数据清洗​ 格式解析与字段提取:使用正则表达式(REGEX)和文件解析函数,对采集到的 CSV、JSON、XML 等不同格式的中外文专利、企业、人才等源数据进行解析,按照预定义字段(如 专利数据的“申请日期”、企业数据的“成立时间”人才数据的“当前所在单位” 等)进行数据提取。 对不规范数据处理:​包括缺失值补充和重复值去重处理。​ 2.数据加工处理​ 产业链各层级细分领域提取并编码:采用基于 BERT 的文本分类预测算法模型,输入多维文本数据(如企业名称、专利名称),通过预训练模型进行特征提取,输出对应的产业链各层级细分领域标签,按照产业级别进行产业代码编码。​ 产业链相关数据标引提取:通过产业链标引模型对文本智能标引和利用关键词匹配算法结合专利分类号对专利数据检索标引,标引得到产业链各层级细分领域相关的专利、企业、人才数据,为以上创新要素数据赋予产业代码和产业链细分领域标签。 专利企业人才等多维数据融合标引:基于分类标签提取融合算法,将专利数据中的“原始申请人”,与企业数据中的“企业名称”,以及人才数据中的“当前所在单位”等多源异构数据进行关联标引,进一步为专利、企业、人才等数据赋予产业代码和产业链细分领域标签,最终生成内容完整、形式规范的产业链标签信息,形成产业链专利数据、产业链企业数据和产业链人才数据。 个人数据与公共数据处理:对涉及的个人数据(如发明人)进行匿名化处理,对公共数据中的敏感信息(如企业名称),进行去标识化处理,确保无法通过任何算法还原原始数据。​ 3.数据统计分析​ 产业链细分领域数据统计:基于各级别产业代码标签,进行产业链各层级细分领域的中外文专利、企业、人才数据统计,将统计结果按照产业链细分分类进行汇总,生成数据报表“统计数据”,直观展示产业链创新要素分布情况。例如,统计“专利数据”在各级别产业代码下的“申请号”数量,得到“统计数据”中产业各细分领域的专利数量;统计“企业数据”在各级别产业代码下的“企业名称”数量,得到“统计数据”中产业各细分领域的企业数量;统计“人才数据”在各级别产业代码下的“发明人”数量,得到“统计数据”中产业各细分领域的人才数量。

This dataset is obtained through collection, indexing and analysis of industrial chain label statistical data and innovation factor data including patents, enterprises and talents, which is updated monthly to build a dynamic data network covering the entire industrial chain. It provides valuable references for government departments to make decisions such as industrial planning, investment and talent attraction, and enterprise cultivation, and also offers data support for enterprises' R&D project initiation and competitor analysis. For government departments: By leveraging the "Statistical Data", they can grasp the current development status and hot innovation directions of patent technologies, innovative enterprises, and inventive talents at all levels of the global and Chinese industrial chains, identify industrial chain links with high innovation activity, high enterprise concentration, and current gaps or deficiencies, and thus increase targeted support for these links when formulating industrial plans. By using "Enterprise Data" and "Talent Data", they can accurately identify enterprises with innovative strength and high-end talents based on the number of patent publications and invention publications, and carry out targeted investment and talent attraction and enterprise cultivation. For enterprises: By referring to the "Statistical Data", they can understand the overall patent application situation of the entire industry, discover sub-sectors with active technological innovation in the industry, pay attention to new technologies in these fields, and carry out targeted upgrades in combination with their own product characteristics. By comparing indicators such as the number of patents between themselves and other enterprises in the same industry through "Enterprise Data", they can understand their position in the industry and analyze their own strengths and weaknesses. By analyzing the patent technologies of different enterprises through "Patent Data", they can grasp the R&D directions and innovation achievements of competitors, avoid redundant R&D when initiating R&D projects, and identify differentiated competition directions; they can also discover potential excellent talents by using "Talent Data". The industrial chain data is updated monthly. The statistical scope of the sample data is up to December 2024. 1. Data Cleaning Format Parsing and Field Extraction: Regular expressions (REGEX) and file parsing functions are used to parse the collected Chinese and foreign source data including patents, enterprises and talents in various formats such as CSV, JSON, XML, and extract data according to pre-defined fields (e.g., "application date" for patent data, "establishment time" for enterprise data, "current affiliated institution" for talent data, etc.). Processing of Non-standard Data: Including missing value imputation and duplicate value removal. 2. Data Processing and Enhancement Extraction and Coding of Sub-sectors at All Levels of the Industrial Chain: A BERT-based text classification and prediction algorithm model is adopted. Multi-dimensional text data (e.g., enterprise names, patent names) are input, feature extraction is performed through the pre-trained model, corresponding sub-sector labels of the industrial chain at all levels are output, and industrial code coding is carried out according to the industrial level. Indexing and Extraction of Industrial Chain-related Data: Intelligent text indexing is performed through the industrial chain indexing model, and patent data is retrieved and indexed by combining keyword matching algorithms with patent classification numbers. Patent, enterprise and talent data related to sub-sectors at all levels of the industrial chain are obtained through indexing, and the above-mentioned innovation factor data are assigned industrial codes and industrial chain sub-sector labels. Multi-dimensional Data Fusion Indexing for Patents, Enterprises and Talents: Based on the classification label extraction and fusion algorithm, multi-source heterogeneous data such as "original applicant" in patent data, "enterprise name" in enterprise data and "current affiliated institution" in talent data are associated and indexed. Further, industrial codes and industrial chain sub-sector labels are assigned to patent, enterprise and talent data, and finally complete and standardized industrial chain label information is generated, forming industrial chain patent data, industrial chain enterprise data and industrial chain talent data. Processing of Personal and Public Data: Anonymization processing is performed for involved personal data (e.g., inventors), and de-identification processing is performed for sensitive information in public data (e.g., enterprise names), ensuring that the original data cannot be restored by any algorithm. 3. Data Statistical Analysis Industrial Chain Sub-sector Data Statistics: Based on industrial code labels at all levels, statistics are conducted on Chinese and foreign patent, enterprise and talent data for each level of industrial chain sub-sectors. The statistical results are summarized according to industrial chain sub-sector classifications to generate the data report "Statistical Data", which intuitively displays the distribution of industrial chain innovation factors. For example, counting the number of "application numbers" of "Patent Data" under industrial codes at all levels to obtain the number of patents in each sub-sector of the industry in the "Statistical Data"; counting the number of "enterprise names" of "Enterprise Data" under industrial codes at all levels to obtain the number of enterprises in each sub-sector of the industry in the "Statistical Data"; counting the number of "inventors" of "Talent Data" under industrial codes at all levels to obtain the number of talents in each sub-sector of the industry in the "Statistical Data".
提供机构:
六棱镜(杭州)科技有限公司
创建时间:
2025-05-15
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集聚焦氢能产业链,包含截至2024年12月的专利、企业和人才等创新要素数据,共672条记录,每月更新。它通过多源数据融合和统计分析,支持政府部门产业决策和企业研发创新,具有匿名化处理和动态更新特点。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作