andreaponti/NDC-sectors
收藏Hugging Face2023-10-24 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/andreaponti/NDC-sectors
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- text-classification
language:
- en
- es
tags:
- climate
pretty_name: NDC Sector Classification
size_categories:
- n<1K
configs:
- config_name: default
data_files:
- split: train
path: "NDC_sectors.csv"
- config_name: sector_description
data_files: "sectors.json"
---
# NDC Sector Classification
This dataset is built from the tagged NDC ([Climate Watch](https://www.climatewatchdata.org/data-explorer/historical-emissions?historical-emissions-data-sources=climate-watch&historical-emissions-gases=all-ghg&historical-emissions-regions=All%20Selected&historical-emissions-sectors=total-including-lucf%2Ctotal-including-lucf&page=1)) paragraphs made by [GIZ Data Service Center](https://www.giz.de/expertise/html/63018.html) and available on Hugging Face ([GIZ/policy_qa_v0](https://huggingface.co/datasets/GIZ/policy_qa_v0)).
The NDC urls have been taken from [IGES NDC Database](https://www.iges.or.jp/en/pub/iges-indc-ndc-database/en).
Each NDC have been classified in a specific sector if it contains at least a paragraph classified as the specific sector. Each NDC can be associated to multiple sector.
The dataset contains 250 document classified in 18 sectors. The followin plot shows the number of documents tagged as each sector.

## NDC Data
The csv containing the tagged NDC is structured as follows:
- `Country`: The country to which the NDC refers.
- `Document`: The type of document (INDC, First NDC, Second NDC).
- `Language`: The original language of the NDC.
- `Sector`: A json whose keys represent the sectors mentioned in the NDC and whose values represent the number of paragraphs that mention the specific secotor.
- `URL`: The pdf url.
## Sector Data
The json containing the sectors' description follows the scheme below:
```json
{
"topic_list_id":"UUID",
"topics":[
{
"topic_id":"integer",
"topic_name":"string",
"definitions":[
{
"lang":"string",
"description":"string"
}
]
}
]
}
```
**Note:** The descriptions have been taken from Wikipedia (en). The Spanish version is a translation of the english one.
提供机构:
andreaponti
原始信息汇总
NDC Sector Classification 数据集概述
数据集基本信息
- 任务类别: 文本分类
- 语言: 英语, 西班牙语
- 标签: 气候
- 数据集名称: NDC Sector Classification
- 数据规模: 小于1K
数据集配置
- 默认配置:
- 数据文件: NDC_sectors.csv
- 分割: 训练集
- sector_description 配置:
- 数据文件: sectors.json
数据集内容
- 文档数量: 250
- 分类领域数量: 18
NDC 数据结构
- 字段:
Country: 国家Document: 文档类型Language: 原始语言Sector: 领域(JSON格式,键表示领域,值表示提及该领域的段落数量)URL: PDF链接
领域数据结构
- JSON格式:
topic_list_id: UUIDtopics:topic_id: 整数topic_name: 字符串definitions:lang: 字符串description: 字符串
- 描述来源: 英文描述来自Wikipedia,西班牙文描述为英文翻译



