finben-fomc
收藏魔搭社区2025-08-18 更新2025-03-08 收录
下载链接:
https://modelscope.cn/datasets/TheFinAI/finben-fomc
下载链接
链接失效反馈官方服务:
资源简介:
---
# Dataset Card for FinBen-FOMC
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** https://huggingface.co/datasets/TheFinAI/finben-fomc
- **Repository:** https://huggingface.co/datasets/TheFinAI/finben-fomc
- **Paper:** FinBen: An Holistic Financial Benchmark for Large Language Models
- **Leaderboard:** https://huggingface.co/spaces/finosfoundation/Open-Financial-LLM-Leaderboard
### Dataset Summary
FinBen-FOMC is a financial sentiment classification dataset adapted from **FOMC (Shah et al., 2023a)**. The dataset is designed for training and evaluating large language models (LLMs) on classifying central bank policy stances as **Hawkish, Dovish, or Neutral**.
### Supported Tasks and Leaderboards
- **Task:** Hawkish-Dovish Classification
- **Evaluation Metric:** F1 Score, Accuracy
- **Test Size:** 496 instances
### Languages
- English
## Dataset Structure
### Data Instances
Each instance consists of a structured format with the following fields:
- **id**: A unique identifier for each data instance.
- **query**: An excerpt from a central bank’s release.
- **answer**: The classification label (`HAWKISH`, `DOVISH`, or `NEUTRAL`).
### Data Fields
- **id**: Unique string identifier for the data instance.
- **query**: The input text containing an excerpt from a central bank statement.
- **answer**: The classification label (`HAWKISH`, `DOVISH`, or `NEUTRAL`).
### Data Splits
The dataset is split into:
- **Test:** 496 instances
## Dataset Creation
### Curation Rationale
The dataset is adapted from **FOMC (Shah et al., 2023a)** to improve its suitability for LLM-based classification tasks in central bank policy analysis.
### Source Data
#### Initial Data Collection and Normalization
The dataset originates from Federal Open Market Committee (FOMC) statements and other central bank releases.
#### Who are the source language producers?
Central bank officials and policy documents.
### Annotations
#### Annotation Process
Annotations follow a structured classification framework to label monetary policy stances.
#### Who are the annotators?
Financial experts and researchers.
### Personal and Sensitive Information
No personally identifiable information (PII) is included.
## Considerations for Using the Data
### Social Impact of Dataset
This dataset enhances financial NLP capabilities, allowing more accurate analysis of monetary policy signals.
### Discussion of Biases
Potential biases may exist due to:
- Interpretation differences in policy statements.
- Variability in central bank language across periods.
### Other Known Limitations
- Requires financial domain expertise for best model performance.
- May not generalize well to non-FOMC policy documents.
## Additional Information
### Dataset Curators
- The Fin AI Team
### Licensing Information
- **License:** CC BY-NC 4.0
### Citation Information
**Original Dataset:**
```bibtex
@inproceedings{shah2023trillion,
title={Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis},
author={Shah, Agam and Paturi, Suvan and Chava, Sudheer},
booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
editor={Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki},
pages={6664--6679},
year={2023},
organization={Association for Computational Linguistics},
address={Toronto, Canada},
doi={10.18653/v1/2023.acl-long.368}
}
```
**Adapted Version (FinBen-FOMC):**
```bibtex
@article{xie2024finben,
title={FinBen: A Holistic Financial Benchmark for Large Language Models},
author={Xie, Qianqian and others},
journal={arXiv preprint arXiv:2402.12659},
year={2024}
}
```
# FinBen-FOMC 数据集卡片
## 目录
- [目录](#目录)
- [数据集描述](#数据集描述)
- [数据集概述](#数据集概述)
- [支持任务与评测排行榜](#支持任务与评测排行榜)
- [语言](#语言)
- [数据集结构](#数据集结构)
- [数据实例](#数据实例)
- [数据字段](#数据字段)
- [数据划分](#数据划分)
- [数据集构建](#数据集构建)
- [构建动因](#构建动因)
- [源数据](#源数据)
- [标注](#标注)
- [个人与敏感信息](#个人与敏感信息)
- [数据集使用注意事项](#数据集使用注意事项)
- [数据集的社会影响](#数据集的社会影响)
- [偏差讨论](#偏差讨论)
- [其他已知局限性](#其他已知局限性)
- [附加信息](#附加信息)
- [数据集维护者](#数据集维护者)
- [许可信息](#许可信息)
- [引用信息](#引用信息)
- [贡献](#贡献)
## 数据集描述
- **主页**: https://huggingface.co/datasets/TheFinAI/finben-fomc
- **代码仓库**: https://huggingface.co/datasets/TheFinAI/finben-fomc
- **相关论文**: 《FinBen:面向大语言模型的全面金融基准》(FinBen: An Holistic Financial Benchmark for Large Language Models)
- **评测排行榜**: https://huggingface.co/spaces/finosfoundation/Open-Financial-LLM-Leaderboard
### 数据集概述
FinBen-FOMC是改编自**FOMC(Shah等人,2023a)**的金融情感分类数据集。该数据集旨在用于训练和评估大语言模型(Large Language Model,LLM),以完成中央银行政策立场的分类任务,类别包括**鹰派(Hawkish)、鸽派(Dovish)与中性(Neutral)**。
### 支持任务与评测排行榜
- **任务**: 鹰派-鸽派分类
- **评测指标**: F1分数、准确率
- **测试集规模**: 496个数据实例
### 语言
- 英语
## 数据集结构
### 数据实例
每个数据实例采用结构化格式,包含以下字段:
- **id**: 每个数据实例的唯一标识符
- **query**: 中央银行发布文件的节选文本
- **answer**: 分类标签(`HAWKISH`、`DOVISH`或`NEUTRAL`)
### 数据字段
- **id**: 数据实例的唯一字符串标识符
- **query**: 包含中央银行政策声明节选的输入文本
- **answer**: 分类标签(`HAWKISH`、`DOVISH`或`NEUTRAL`)
### 数据划分
该数据集划分为:
- **测试集**: 496个数据实例
## 数据集构建
### 构建动因
本数据集改编自**FOMC(Shah等人,2023a)**,以提升其在中央银行政策分析场景下适配大语言模型分类任务的适用性。
### 源数据
#### 初始数据收集与标准化
本数据集源自联邦公开市场委员会(Federal Open Market Committee,FOMC)声明及其他中央银行发布文件。
#### 源文本创作者
中央银行官员与政策文件。
### 标注
#### 标注流程
标注遵循结构化分类框架,对货币政策立场进行标记。
#### 标注人员构成
金融领域专家与研究人员。
### 个人与敏感信息
本数据集未包含任何个人可识别信息(Personally Identifiable Information,PII)。
## 数据集使用注意事项
### 数据集的社会影响
本数据集可提升金融自然语言处理能力,实现对货币政策信号的更精准分析。
### 偏差讨论
潜在偏差可能源于以下因素:
- 政策声明解读的个体差异
- 不同时期中央银行表述风格的差异性
### 其他已知局限性
- 若要获得最优模型性能,需具备金融领域专业知识
- 该数据集或难以泛化至非FOMC的政策文件
## 附加信息
### 数据集维护者
- The Fin AI团队
### 许可信息
- **许可协议**: CC BY-NC 4.0
### 引用信息
**原始数据集引用:**
bibtex
@inproceedings{shah2023trillion,
title={Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis},
author={Shah, Agam and Paturi, Suvan and Chava, Sudheer},
booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
editor={Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki},
pages={6664--6679},
year={2023},
organization={Association for Computational Linguistics},
address={Toronto, Canada},
doi={10.18653/v1/2023.acl-long.368}
}
**改编版本(FinBen-FOMC)引用:**
bibtex
@article{xie2024finben,
title={FinBen: A Holistic Financial Benchmark for Large Language Models},
author={Xie, Qianqian and others},
journal={arXiv preprint arXiv:2402.12659},
year={2024}
}
提供机构:
maas
创建时间:
2025-03-03



