karoldobiczek/fomc-communication-counterfactual
收藏数据集概述
基本信息
- 许可证: cc-by-nc-4.0
- 任务类别: 文本分类
- 语言: 英语
- 标签: 金融, 反事实
- 数据规模: 1K<n<10K
数据集描述
该数据集是从FOMC演讲、会议纪要和新闻稿中收集的句子集合。部分数据已手动标注为鹰派、鸽派或中性。
标签映射
- LABEL 2: 中性
- LABEL 1: 鹰派
- LABEL 0: 鸽派
反事实生成拆分
此外,对于反事实生成任务,我们添加了一个自定义拆分,目标类别在test_with_targets.csv中。
引用
如果您使用此数据集,请考虑引用相应的论文:
c @inproceedings{shah-etal-2023-trillion, title = "Trillion Dollar Words: A New Financial Dataset, Task {&} Market Analysis", author = "Shah, Agam and Paturi, Suvan and Chava, Sudheer", booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.acl-long.368", doi = "10.18653/v1/2023.acl-long.368", pages = "6664--6679", abstract = "Monetary policy pronouncements by Federal Open Market Committee (FOMC) are a major driver of financial market returns. We construct the largest tokenized and annotated dataset of FOMC speeches, meeting minutes, and press conference transcripts in order to understand how monetary policy influences financial markets. In this study, we develop a novel task of hawkish-dovish classification and benchmark various pre-trained language models on the proposed dataset. Using the best-performing model (RoBERTa-large), we construct a measure of monetary policy stance for the FOMC document release days. To evaluate the constructed measure, we study its impact on the treasury market, stock market, and macroeconomic indicators. Our dataset, models, and code are publicly available on Huggingface and GitHub under CC BY-NC 4.0 license.", }



