five

ClimatePolicyRadar/national-climate-targets

收藏
Hugging Face2024-04-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ClimatePolicyRadar/national-climate-targets
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 dataset_info: features: - name: text dtype: string - name: annotation_agent dtype: int64 - name: geography dtype: string - name: region dtype: string - name: translated dtype: bool - name: annotation_NZT dtype: int64 - name: annotation_Reduction dtype: int64 - name: annotation_Other dtype: int64 splits: - name: train num_bytes: 2912069 num_examples: 2610 download_size: 1522649 dataset_size: 2912069 configs: - config_name: default data_files: - split: train path: data/train-* --- # National Climate Targets Training Dataset – Climate Policy Radar A dataset of climate targets made by national governments in their laws, policies and UNFCCC submissions which has been used to train a classifier. Text was sourced from the [Climate Policy Radar database](https://app.climatepolicyradar.org). We define a target as an aim to achieve a specific outcome, that is quantifiable and is given a deadline. This dataset distinguishes between different types of targets: - **Reduction** (a.k.a. emissions reduction): a target referring to a reduction in greenhouse gas emissions, either economy-wide or for a sector. - **Net zero**: a commitment to balance GHG emissions with removal, effectively reducing the net emissions to zero. - **Other**: those that do not fit into the Reduction or Net Zero category but satisfy our definition of a target, e.g. renewable energy targets. *IMPORTANT NOTE:* this dataset has been used to train a machine learning model, and **is not a list of all climate targets published by national governments**. For more information on dataset creation, [see our paper](https://arxiv.org/abs/2404.02822). ## Dataset Description This dataset includes 2,610 text passages containing 1,193 target mentions annotated in a multilabel setting: one text passage can be assigned to 0 or more target types. This breaks down as follows. | | Number of passages | |:--------------|--------:| | NZT | 203 | | Reduction | 359 | | Other | 631 | | No Annotation | 1,584 | It was annotated by 3 domain-experts with steps taken to ensure consistency by measuring inter-annotator agreement. Annotator `2` is a data scientist, with a combination of sampling negatives and errors caught during posthoc reviews. All text is in English: the `translated` column describes whether it has been translated from another language using the Google Cloud Translation API. Further to the text and annotations, we also include characteristics of the documents we use to make equity calculations and anonymised assignment of annotations to annotators. For more information on the dataset and its creation see **our paper TBA**. ## License Our dataset is licensed as [CC by 4.0](https://creativecommons.org/licenses/by/4.0/). Please read our [Terms of Use](https://app.climatepolicyradar.org/terms-of-use), including any specific terms relevant to commercial use. Contact partners@climatepolicyradar.org with any questions. ## Links - [Paper](https://arxiv.org/abs/2404.02822) ## Citation *Juhasz, M., Marchand, T., Melwani, R., Dutia, K., Goodenough, S., Pim, H., & Franks, H. (2024). Identifying Climate Targets in National Laws and Policies using Machine Learning. arXiv preprint arXiv:2404.02822.* ``` @misc{juhasz2024identifying, title={Identifying Climate Targets in National Laws and Policies using Machine Learning}, author={Matyas Juhasz and Tina Marchand and Roshan Melwani and Kalyan Dutia and Sarah Goodenough and Harrison Pim and Henry Franks}, year={2024}, eprint={2404.02822}, archivePrefix={arXiv}, primaryClass={cs.CY} } ``` ## Authors & Contact Climate Policy Radar team: Matyas Juhasz, Tina Marchand, Roshan Melwani, Kalyan Dutia, Sarah Goodenough, Harrison Pim, and Henry Franks. https://climatepolicyradar.org
提供机构:
ClimatePolicyRadar
原始信息汇总

国家气候目标训练数据集 – 气候政策雷达

数据集描述

该数据集包含2,610个文本段落,其中包含1,193个目标提及,在多标签设置中进行标注:一个文本段落可以分配给0个或多个目标类型。具体分布如下:

Number of passages
NZT 203
Reduction 359
Other 631
No Annotation 1,584

数据集由3位领域专家标注,并通过测量注释者间的一致性来确保一致性。注释者2是一位数据科学家,通过采样负例和事后审查来捕捉错误。

所有文本均为英语:translated列描述了是否使用Google Cloud Translation API从另一种语言翻译而来。除了文本和注释外,我们还包含了用于进行公平计算的文档特征以及对注释者进行匿名分配。

数据集特征

  • text: 字符串类型,文本内容
  • annotation_agent: 整数类型,注释者标识
  • geography: 字符串类型,地理信息
  • region: 字符串类型,地区信息
  • translated: 布尔类型,是否翻译
  • annotation_NZT: 整数类型,Net Zero目标注释
  • annotation_Reduction: 整数类型,Reduction目标注释
  • annotation_Other: 整数类型,Other目标注释

数据集分割

  • train: 训练集,包含2,610个样本,总大小为2,912,069字节

数据集大小

  • 下载大小: 1,522,649字节
  • 数据集大小: 2,912,069字节

配置

  • default: 默认配置,包含训练集数据文件路径为data/train-*

许可证

该数据集采用CC by 4.0许可证。

搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作