Climate Risk Prediction Dataset for US Listed Insurance Companies (2000-2023, with Dual-layer Network Structure and GNN Code)
收藏DataCite Commons2026-03-20 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=35c407a5e6414650b491c5523235fb42
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is constructed to support the research "Insurance Climate Risk Prediction: A Framework via Enhanced Graph Neural Networks with Dual-layer Network Analysis". It covers climate risk, supply chain relationships, and asset return data for US listed insurance companies and their associated enterprises from 2000 to 2023. Data sources include the FactSet Revere supply chain relationship database, Yahoo Finance stock price data, NOAA climate disaster loss data, and Fama-French three-factor data.The data processing includes the following steps: First, the first-layer network is constructed based on FactSet Revere supply chain relationships, reflecting multiple entity associations between enterprises such as suppliers, customers, competitors, and partners. The adjacency matrix is constructed by accumulating weights of multiple relationship types. The original supply chain records total 13,099 entries, and after completeness screening (data completeness >75% for both parties), a network with 143 nodes (58 insurance companies and 85 associated enterprises) is formed. Second, based on daily stock price data from 2020-2023, the Fama-French three-factor model is used to strip market systematic risk, calculate the specific components of asset returns, and construct the second-layer asset correlation network using Pearson correlation coefficients. Node features integrate climate and financial dimensions. Climate data is based on NOAA's economic loss time series for four types of extreme weather events (storms, winter storms, drought/wildfire, floods), calculating statistical features such as mean, volatility, VaR, and CVaR. Financial features are extracted from company stock returns. Finally, a 16-dimensional feature matrix is constructed, and companies are classified into low, medium, and high-risk categories based on the Climate Change Impact Indicator (CCII).The dataset covers the period from 2000 to 2023, with node features constructed based on the entire period to reflect long-term risk trends, while network relationships focus on 2020-2023 to capture recent risk transmission dynamics. The spatial scope is limited to US-listed insurance companies on NYSE and NASDAQ and their supply chain-associated enterprises. Data gaps mainly exist in specific climate variables or stock price records for some companies, filled using linear interpolation and window moving averages. Discontinuous records in supply chain relationships are supplemented by manually checking annual reports and SEC filings. Data errors mainly stem from county-level aggregation bias in climate loss statistics and market noise in stock price data, with an estimated error within 5%.The dataset includes the following files:The input folder contains two parts: (1) networks folder: stores dual-layer adjacency matrices as 143×143 sparse matrices; (2) features_and_labels folder contains three parts: (2-1) original climate loss statistics, including time series for four types of events in millions of US dollars; (2-2) statistics for four risk dimension indicators of companies; (2-3) 16-dimensional features and risk labels for 143 nodes, with rows as node indices and columns as feature names, unitless.The modules folder contains code for MR-GNN-A, GRU-GNN, and K-hop GNN modules.The results folder contains experimental results for multiple modules, i.e., model accuracy metrics, with rows corresponding to different GNN architectures and parameter settings.All data are in CSV format, and code files are in Python, readable directly by common data analysis tools or Python environments.
提供机构:
Science Data Bank
创建时间:
2026-03-20



