aatrocy/fortune500-esg-metrics-2021-2023
收藏Hugging Face2025-12-08 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/aatrocy/fortune500-esg-metrics-2021-2023
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- tabular-classification
- tabular-regression
- time-series-forecasting
language:
- en
tags:
- esg
- sustainability
- climate
- finance
- corporate-governance
- environmental
- social-responsibility
- fortune-500
- carbon-emissions
- renewable-energy
pretty_name: Fortune 500 ESG Metrics Dataset (2021-2023)
size_categories:
- 1M<n<10M
dataset_info:
features:
- name: name
dtype: string
- name: year
dtype: int64
- name: metric_name
dtype: string
- name: value
dtype: string
- name: units
dtype: string
- name: additional_notes
dtype: string
splits:
- name: train
num_bytes: 1210000000
num_examples: 500000
download_size: 1130000000
dataset_size: 1210000000
configs:
- config_name: default
data_files:
- split: train
path: Fortune500_ESG_Metrics_2021-2023.csv
---
# Fortune 500 ESG Metrics Dataset (2021-2023)
<div align="center">




</div>
## 🌍 Dataset Description
This comprehensive dataset contains Environmental, Social, and Governance (ESG) metrics from Fortune 500 companies spanning 2021-2023. It represents one of the most extensive collections of corporate sustainability data publicly available, compiled from official corporate reports, sustainability disclosures, and ESG filings.
### 🎯 Key Features
- **📊 Extensive Coverage**: Fortune 500 companies
- **📅 Multi-Year Data**: Complete data for 2021, 2022, and 2023
- **🔍 Detailed Metrics**: Hundreds of ESG indicators per company
- **📏 Standardized Format**: Consistent structure across all companies
- **📝 Rich Metadata**: Includes units and additional notes for context
## 📁 Dataset Structure
### Schema
| Column | Type | Description |
|--------|------|-------------|
| `name` | string | The specific metric or indicator name as reported |
| `year` | integer | Reporting year (2021, 2022, or 2023) |
| `metric_name` | string | Standardized metric identifier for cross-company comparison |
| `value` | string | The reported value (numeric or categorical) |
| `units` | string | Unit of measurement (e.g., MWh, tCO2e, %, count) |
| `additional_notes` | string | Additional context, methodology notes, or clarifications |
### 📊 Data Sample
```json
{
"name": "Total Energy Consumption",
"year": 2021,
"metric_name": "energy_consumption_total",
"value": "1234567",
"units": "MWh",
"additional_notes": "Includes all global facilities"
}
```
## 🏢 Companies Included
The dataset covers major corporations across various industries:
### Technology
- Apple, Microsoft, Google, Amazon, Meta, IBM, Oracle, Salesforce
### Financial Services
- JPMorgan Chase, Bank of America, Wells Fargo, Goldman Sachs, Morgan Stanley
### Healthcare & Pharmaceuticals
- Johnson & Johnson, Pfizer, Abbott Laboratories, Merck, CVS Health
### Consumer Goods
- Walmart, Target, Procter & Gamble, Coca-Cola, PepsiCo
### Energy & Utilities
- ExxonMobil, Chevron, NextEra Energy, Duke Energy
### Manufacturing & Industrial
- General Electric, Boeing, Caterpillar, 3M, Honeywell
### And 450+ more Fortune 500 companies...
## 📈 Metrics Categories
### 🌱 Environmental Metrics
- **Energy**: Consumption, renewable energy usage, energy intensity
- **Emissions**: Scope 1, 2, and 3 GHG emissions, emission reduction targets
- **Water**: Usage, recycling, conservation efforts
- **Waste**: Generation, recycling rates, hazardous waste management
- **Biodiversity**: Land use, conservation initiatives
### 👥 Social Metrics
- **Workforce**: Diversity statistics, employee turnover, training hours
- **Safety**: Injury rates, safety incidents, health programs
- **Community**: Investment, volunteer hours, local hiring
- **Supply Chain**: Supplier diversity, audits, labor practices
### 🏛️ Governance Metrics
- **Board**: Composition, diversity, independence
- **Ethics**: Code of conduct violations, whistleblower reports
- **Risk Management**: ESG risk assessment, climate risk disclosure
- **Transparency**: Reporting standards, external verification
## 🚀 Usage Examples
### Loading the Dataset
```python
import pandas as pd
from datasets import load_dataset
# Method 1: Using Hugging Face datasets library
dataset = load_dataset("GemiAI2025/fortune500-esg-metrics-2021-2023")
df = pd.DataFrame(dataset['train'])
# Method 2: Direct download
df = pd.read_csv("Fortune500_ESG_Metrics_2021-2023.csv")
```
### Basic Analysis
```python
# View companies in dataset
companies = df['name'].str.extract(r'(.+?)_\d{4}')[0].unique()
print(f"Total companies: {len(companies)}")
# Analyze emissions data
emissions_data = df[df['metric_name'].str.contains('emission', case=False)]
avg_emissions = emissions_data.groupby('year')['value'].mean()
# Track renewable energy adoption
renewable_energy = df[df['metric_name'].str.contains('renewable', case=False)]
renewable_trend = renewable_energy.groupby(['year'])['value'].mean()
```
### Machine Learning Applications
```python
# Prepare data for ESG score prediction
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Feature engineering for ML models
pivot_data = df.pivot_table(
index=['company', 'year'],
columns='metric_name',
values='value'
)
# Use for sustainability prediction models
X_train, X_test, y_train, y_test = train_test_split(
features, targets, test_size=0.2, random_state=42
)
```
## 🎯 Use Cases
### 📊 Research & Analysis
- Academic research on corporate sustainability
- ESG performance benchmarking
- Sector-specific sustainability analysis
- Time-series analysis of ESG improvements
### 🤖 Machine Learning
- ESG score prediction models
- Sustainability risk assessment
- Anomaly detection in reporting
- Predictive analytics for future targets
### 💼 Business Applications
- Investment screening and due diligence
- Competitive analysis
- Supply chain sustainability assessment
- Regulatory compliance monitoring
### 📚 Educational
- Case studies for business schools
- Data science projects
- Sustainability course materials
- Research datasets for thesis work
## 📋 Data Collection Methodology
1. **Source Documents**: Data extracted from:
- Annual Sustainability Reports
- CDP (Carbon Disclosure Project) submissions
- GRI (Global Reporting Initiative) reports
- SEC ESG disclosures
- Corporate integrated reports
2. **Standardization Process**:
- Metric names standardized across companies
- Units converted to common standards where possible
- Temporal alignment for year-over-year comparison
3. **Quality Assurance**:
- Cross-validation with multiple sources
- Outlier detection and verification
- Completeness checks
## ⚠️ Important Considerations
### Data Limitations
- **Reporting Standards**: Companies may use different methodologies
- **Coverage Gaps**: Not all companies report all metrics
- **Temporal Differences**: Fiscal years may vary between companies
- **Voluntary Disclosure**: Some metrics are not mandatory
### Recommended Preprocessing
```python
# Handle missing values appropriately
df['value'] = pd.to_numeric(df['value'], errors='coerce')
# Standardize company names
df['company'] = df['name'].str.extract(r'(.+?)_\d{4}')[0]
# Create year-over-year change metrics
df['yoy_change'] = df.groupby(['company', 'metric_name'])['value'].pct_change()
```
## 📖 Citation
If you use this dataset in your research or applications, please cite:
```bibtex
@dataset{fortune500_esg_metrics_2023,
title = {Fortune 500 ESG Metrics Dataset (2021-2023)},
author = {GemiAI2025},
year = {2023},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/GemiAI2025/fortune500-esg-metrics-2021-2023}
}
```
## 📜 License
This dataset is released under the [Creative Commons Attribution 4.0 International License (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
You are free to:
- **Share**: Copy and redistribute the material in any medium or format
- **Adapt**: Remix, transform, and build upon the material for any purpose, even commercially
## 🤝 Contributing
We welcome contributions to improve and expand this dataset:
- Report issues or inconsistencies
- Suggest additional metrics or companies
- Share derivative datasets or analyses
## 📞 Contact
- **Dataset Curator**: GemiAI2025
- **Hugging Face Profile**: [@GemiAI2025](https://huggingface.co/GemiAI2025)
- **Issues**: Please use the [discussion tab](https://huggingface.co/datasets/GemiAI2025/fortune500-esg-metrics-2021-2023/discussions)
## 🙏 Acknowledgments
This dataset compilation was made possible through the transparency efforts of Fortune 500 companies and their commitment to ESG disclosure. Special thanks to the open data community for inspiration and support.
---
<div align="center">
Made with 💚 for the sustainability and data science community
</div>
提供机构:
aatrocy



