five

Three types of company and technology datasets towards a unified graph model

收藏
Mendeley Data2024-03-27 更新2024-06-26 收录
下载链接:
https://data.mendeley.com/datasets/vvdcjtkg8k
下载链接
链接失效反馈
官方服务:
资源简介:
We collect three types of publicly available datasets. 1) Company Data: Initially, we collect data on promising small and medium-sized enterprises located in Gyeonggi Province from 2019 to 2021, sourced from the Gyeonggi Open Data Portal, "Gyeonggi Data Dream." 2) Technology Keyword Data: Utilizing prompt engineering with the ChatGPT service [3], we define a set of 14 initial technology keywords. Since it is insufficient to construct a graph with only 14 initial technology keywords, we extend the keywords using a patent database. We extract core technology keyword sets through query expansion using the previously defined initial technology keyword set. 3) Patent Data: We acquire patent disclosure and registration datasets from the Patent Office's KIPRIS PLUS patent information utilization service. Finally, to construct graph data, For a total of 47,385 patent data with 727 SMEs collected as applicants, the relationship between each company and the technology keywords included in the patent data is modeled as a heterogeneous graph. It consists of two types of nodes, Company and Technology, 727 and 1957 respectively, which define three types of edges. There are Use, which indicates that a particular company uses a particular technology, Share, which means that two company share common technology in their main products respectively, and Relate, which indicates that the two technologies are used in a common patent document. We generate graph data as data objects provided by the pytorch-geometric library. For the reconstruction of this data, we converted it into a form of a Pytorch Tensor and stored it as a pickle file. This can be reproduced by "torch.load('Graph.pt')" module. Note that the original datasets were written in Korean. We converted them to English for reference: (eng) init_keywords.txt, (eng) expanded_keywords.txt, (eng) company_data.csv, and (eng) patents.csv.
创建时间:
2024-01-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作