five

karoldobiczek/fomc-communication-counterfactual

收藏
Hugging Face2024-05-29 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/karoldobiczek/fomc-communication-counterfactual
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 task_categories: - text-classification language: - en tags: - finance - counterfactual size_categories: - 1K<n<10K --- Dataset adapted from [original work](https://huggingface.co/datasets/gtfintechlab/fomc_communication/) by Shah et al. ## About Dataset The dataset is a collection of sentences from FOMC speeches, meeting minutes and press releases ([see corresponding paper](https://aclanthology.org/2023.acl-long.368)). A subset of the data has been manually annotated as **hawkish**, **dovish**, or **neutral**. ### Label mapping - LABEL 2: Neutral - LABEL 1: Hawkish - LABEL 0: Dovish ### Counterfactual generation split Additionally, for counterfactual generation tasks, we add a custom split with target classes in `test_with_targets.csv` ## Cite If you want to use this dataset, please consider citing the corresponding paper: ```c @inproceedings{shah-etal-2023-trillion, title = "Trillion Dollar Words: A New Financial Dataset, Task {\&} Market Analysis", author = "Shah, Agam and Paturi, Suvan and Chava, Sudheer", booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.acl-long.368", doi = "10.18653/v1/2023.acl-long.368", pages = "6664--6679", abstract = "Monetary policy pronouncements by Federal Open Market Committee (FOMC) are a major driver of financial market returns. We construct the largest tokenized and annotated dataset of FOMC speeches, meeting minutes, and press conference transcripts in order to understand how monetary policy influences financial markets. In this study, we develop a novel task of hawkish-dovish classification and benchmark various pre-trained language models on the proposed dataset. Using the best-performing model (RoBERTa-large), we construct a measure of monetary policy stance for the FOMC document release days. To evaluate the constructed measure, we study its impact on the treasury market, stock market, and macroeconomic indicators. Our dataset, models, and code are publicly available on Huggingface and GitHub under CC BY-NC 4.0 license.", } ```
提供机构:
karoldobiczek
原始信息汇总

数据集概述

基本信息

  • 许可证: cc-by-nc-4.0
  • 任务类别: 文本分类
  • 语言: 英语
  • 标签: 金融, 反事实
  • 数据规模: 1K<n<10K

数据集描述

该数据集是从FOMC演讲、会议纪要和新闻稿中收集的句子集合。部分数据已手动标注为鹰派鸽派中性

标签映射

  • LABEL 2: 中性
  • LABEL 1: 鹰派
  • LABEL 0: 鸽派

反事实生成拆分

此外,对于反事实生成任务,我们添加了一个自定义拆分,目标类别在test_with_targets.csv中。

引用

如果您使用此数据集,请考虑引用相应的论文:

c @inproceedings{shah-etal-2023-trillion, title = "Trillion Dollar Words: A New Financial Dataset, Task {&} Market Analysis", author = "Shah, Agam and Paturi, Suvan and Chava, Sudheer", booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.acl-long.368", doi = "10.18653/v1/2023.acl-long.368", pages = "6664--6679", abstract = "Monetary policy pronouncements by Federal Open Market Committee (FOMC) are a major driver of financial market returns. We construct the largest tokenized and annotated dataset of FOMC speeches, meeting minutes, and press conference transcripts in order to understand how monetary policy influences financial markets. In this study, we develop a novel task of hawkish-dovish classification and benchmark various pre-trained language models on the proposed dataset. Using the best-performing model (RoBERTa-large), we construct a measure of monetary policy stance for the FOMC document release days. To evaluate the constructed measure, we study its impact on the treasury market, stock market, and macroeconomic indicators. Our dataset, models, and code are publicly available on Huggingface and GitHub under CC BY-NC 4.0 license.", }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作