five

climatebert/climate_sentiment

收藏
Hugging Face2023-04-18 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/climatebert/climate_sentiment
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - expert-generated language_creators: - found language: - en license: cc-by-nc-sa-4.0 multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - text-classification task_ids: - sentiment-classification pretty_name: ClimateSentiment dataset_info: features: - name: text dtype: string - name: label dtype: class_label: names: '0': risk '1': neutral '2': opportunity splits: - name: train num_bytes: 492077 num_examples: 1000 - name: test num_bytes: 174265 num_examples: 320 download_size: 373638 dataset_size: 666342 --- # Dataset Card for climate_sentiment ## Dataset Description - **Homepage:** [climatebert.ai](https://climatebert.ai) - **Repository:** - **Paper:** [papers.ssrn.com/sol3/papers.cfm?abstract_id=3998435](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3998435) - **Leaderboard:** - **Point of Contact:** [Nicolas Webersinke](mailto:nicolas.webersinke@fau.de) ### Dataset Summary We introduce an expert-annotated dataset for classifying climate-related sentiment of climate-related paragraphs in corporate disclosures. ### Supported Tasks and Leaderboards The dataset supports a ternary sentiment classification task of whether a given climate-related paragraph has sentiment opportunity, neutral, or risk. ### Languages The text in the dataset is in English. ## Dataset Structure ### Data Instances ``` { 'text': '− Scope 3: Optional scope that includes indirect emissions associated with the goods and services supply chain produced outside the organization. Included are emissions from the transport of products from our logistics centres to stores (downstream) performed by external logistics operators (air, land and sea transport) as well as the emissions associated with electricity consumption in franchise stores.', 'label': 1 } ``` ### Data Fields - text: a climate-related paragraph extracted from corporate annual reports and sustainability reports - label: the label (0 -> risk, 1 -> neutral, 2 -> opportunity) ### Data Splits The dataset is split into: - train: 1,000 - test: 320 ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization Our dataset contains climate-related paragraphs extracted from financial disclosures by firms. We collect text from corporate annual reports and sustainability reports. For more information regarding our sample selection, please refer to the Appendix of our paper (see [citation](#citation-information)). #### Who are the source language producers? Mainly large listed companies. ### Annotations #### Annotation process For more information on our annotation process and annotation guidelines, please refer to the Appendix of our paper (see [citation](#citation-information)). #### Who are the annotators? The authors and students at Universität Zürich and Friedrich-Alexander-Universität Erlangen-Nürnberg with majors in finance and sustainable finance. ### Personal and Sensitive Information Since our text sources contain public information, no personal and sensitive information should be included. ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators - Julia Anna Bingler - Mathias Kraus - Markus Leippold - Nicolas Webersinke ### Licensing Information This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license (cc-by-nc-sa-4.0). To view a copy of this license, visit [creativecommons.org/licenses/by-nc-sa/4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). If you are interested in commercial use of the dataset, please contact [markus.leippold@bf.uzh.ch](mailto:markus.leippold@bf.uzh.ch). ### Citation Information ```bibtex @techreport{bingler2023cheaptalk, title={How Cheap Talk in Climate Disclosures Relates to Climate Initiatives, Corporate Emissions, and Reputation Risk}, author={Bingler, Julia and Kraus, Mathias and Leippold, Markus and Webersinke, Nicolas}, type={Working paper}, institution={Available at SSRN 3998435}, year={2023} } ``` ### Contributions Thanks to [@webersni](https://github.com/webersni) for adding this dataset.
提供机构:
climatebert
原始信息汇总

数据集概述

数据集名称

  • 名称: ClimateSentiment

数据集描述

  • 任务类型: 文本分类
  • 任务详情: 三元情感分类任务,判断气候相关段落的情感倾向为机会、中性或风险。
  • 语言: 英语

数据集结构

  • 数据字段:
    • text: 字符串类型,来自企业年报和可持续发展报告的气候相关段落。
    • label: 类别标签,0表示风险,1表示中性,2表示机会。
  • 数据分割:
    • 训练集: 1000个样本
    • 测试集: 320个样本

数据集创建

  • 来源数据: 来自大型上市公司的财务披露,包括企业年报和可持续发展报告。
  • 注释者: 来自苏黎世大学和埃尔朗根-纽伦堡大学的金融和可持续金融专业的作者及学生。

许可信息

  • 许可证: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (cc-by-nc-sa-4.0)

数据集大小

  • 下载大小: 373638字节
  • 数据集大小: 666342字节
  • 样本数量: 训练集1000个,测试集320个

数据集特征

  • 特征:
    • text: 字符串
    • label: 类别标签,包含三个类别:风险、中性和机会。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作