five

andreaponti/NDC-sectors

收藏
Hugging Face2023-10-24 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/andreaponti/NDC-sectors
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - text-classification language: - en - es tags: - climate pretty_name: NDC Sector Classification size_categories: - n<1K configs: - config_name: default data_files: - split: train path: "NDC_sectors.csv" - config_name: sector_description data_files: "sectors.json" --- # NDC Sector Classification This dataset is built from the tagged NDC ([Climate Watch](https://www.climatewatchdata.org/data-explorer/historical-emissions?historical-emissions-data-sources=climate-watch&historical-emissions-gases=all-ghg&historical-emissions-regions=All%20Selected&historical-emissions-sectors=total-including-lucf%2Ctotal-including-lucf&page=1)) paragraphs made by [GIZ Data Service Center](https://www.giz.de/expertise/html/63018.html) and available on Hugging Face ([GIZ/policy_qa_v0](https://huggingface.co/datasets/GIZ/policy_qa_v0)). The NDC urls have been taken from [IGES NDC Database](https://www.iges.or.jp/en/pub/iges-indc-ndc-database/en). Each NDC have been classified in a specific sector if it contains at least a paragraph classified as the specific sector. Each NDC can be associated to multiple sector. The dataset contains 250 document classified in 18 sectors. The followin plot shows the number of documents tagged as each sector. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6530ecfb753d5411b7e9ff11/RgjLrHhdomY3woSzlybMX.png) ## NDC Data The csv containing the tagged NDC is structured as follows: - `Country`: The country to which the NDC refers. - `Document`: The type of document (INDC, First NDC, Second NDC). - `Language`: The original language of the NDC. - `Sector`: A json whose keys represent the sectors mentioned in the NDC and whose values represent the number of paragraphs that mention the specific secotor. - `URL`: The pdf url. ## Sector Data The json containing the sectors' description follows the scheme below: ```json { "topic_list_id":"UUID", "topics":[ { "topic_id":"integer", "topic_name":"string", "definitions":[ { "lang":"string", "description":"string" } ] } ] } ``` **Note:** The descriptions have been taken from Wikipedia (en). The Spanish version is a translation of the english one.
提供机构:
andreaponti
原始信息汇总

NDC Sector Classification 数据集概述

数据集基本信息

  • 任务类别: 文本分类
  • 语言: 英语, 西班牙语
  • 标签: 气候
  • 数据集名称: NDC Sector Classification
  • 数据规模: 小于1K

数据集配置

  • 默认配置:
    • 数据文件: NDC_sectors.csv
    • 分割: 训练集
  • sector_description 配置:
    • 数据文件: sectors.json

数据集内容

  • 文档数量: 250
  • 分类领域数量: 18

NDC 数据结构

  • 字段:
    • Country: 国家
    • Document: 文档类型
    • Language: 原始语言
    • Sector: 领域(JSON格式,键表示领域,值表示提及该领域的段落数量)
    • URL: PDF链接

领域数据结构

  • JSON格式:
    • topic_list_id: UUID
    • topics:
      • topic_id: 整数
      • topic_name: 字符串
      • definitions:
        • lang: 字符串
        • description: 字符串
  • 描述来源: 英文描述来自Wikipedia,西班牙文描述为英文翻译
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作