Emissions Estimation Data Module V2

Name: Emissions Estimation Data Module V2
Creator: ESG Book
Published: 2024-01-25 16:22:35
License: 暂无描述

Snowflake2024-01-25 更新2024-05-01 收录

下载链接：

https://app.snowflake.com/marketplace/listing/GZTDZ1ELJJ

下载链接

链接失效反馈

官方服务：

资源简介：

## Overview The ESG Book Estimated Emissions Data Module provides investors with estimated emissions for ~45,000 public corporate entities that do not disclose their emissions. The dataset includes estimations for Scope 1, Scope 2, and Scope 3 (total) emissions, as well as the 15 Scope 3 Categories in tonnes of CO2 equivalents. A confidence rating is also provided alongside each estimated emissions figure, indicating the degree of accuracy of the estimation based on the amount of available data used in the estimation process. Importantly, PCAF data quality indicators are included. The dataset additionally includes the actual reported emissions data of public companies. We currently cover 4000 public companies, where approximately half of them disclose their emissions data. Our Estimated Emissions Data Module thus significantly expanding the coverage of emissions data for use in portfolio analysis and index creation, for instance. ## Methodology Overview Emissions are estimated using the Extreme Gradient Boosting (XGBoost) Model. The model is an unsupervised machine learning model which identifies and analyses complex relationships between large numbers of predictor variables to generate estimations for unknown data. In this case, the model identifies the relationship between 15 financial and non-financial predictor variables and emissions for each region, country, sector and industry to estimate the emissions of companies which are not disclosing emissions data. We have chosen to use a machine learning estimation model rather than a traditional statistical regression model for several key reasons. Firstly, the XGBoost model (machine learning model) is able to handle non-linear relationships. As the predictor variables might be non-linearly correlated with emissions (for instance, a company with 500 employees might not generate 5 times the emissions of a company with only 100 employees due to economies of scale), the ability of the XGBoost model to handle non-linear relationships provide an extra layer of robustness to accurately capture the relationships between the predictor variables and emissions. Secondly, the XGBoost model is able to handle missing data unlike conventional regression models or other machine learning models such as Adaptive Boosting. Though 15 predictor variables are used in the model, all 15 datapoints might not be available for all companies. As such, a threshold of datapoints is set such that the model will estimate emissions for companies which meet this minimum data threshold. Conventional regression models are unable to account for this missing data, where this missing data has to be interpolated, or simply replaced with zeros. This introduces higher order errors into the model, reducing the accuracy of the emissions estimations due to the ambiguity of input data. This issue does not affect the XGBoost model due to its ability to handle missing data. Lastly, the XGBoost model uses a decision-tree algorithm to identify and analyse the complex relationships between the predictor variables and emissions, which is subsequently used in the estimation process. This allows for greater accuracy as the decision tree process corrects the mistakes of the previous trees. The parameters of the model are fine-tuned to increase the precision of estimations. This is done using the Optuna4 , an open source hyperparameter optimization framework, that tests different configurations of hyperparameters on a holdout test set to determine the optimal values for a given regression. Overall, due to the reasons explained above, the XGBoost model shows better accuracy when compared to traditional statistical models such as the Ridge Regression model or other machine learning models such as the Adaptive Boost model. ## Pricing Information Pricing is determined on a use-case basis, thus please contact for more information. When requesting please include the following information: - Organization Name - Position (non-mandatory) - Business Email Address or Telephone Number - Country - Use-case ## Regulatory and Compliance Information This product is allowed for internal use only, users are not allowed to distribute the data externally. If you're interested in a re-distribution of data use case, please contact us. ## Need Help? - If you have questions about our products, contact us at [support@esgbook.com](mailto:support@esgbook.com) ## About Your Company - [ESG Book Website](https://www.esgbook.com/)

提供机构：

ESG Book

创建时间：

2024-01-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集