gtfintechlab/WCB_380k_sentences
收藏Hugging Face2025-10-20 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/gtfintechlab/WCB_380k_sentences
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: bank
dtype: string
- name: year
dtype: int64
- name: doc_id
dtype: string
- name: release_date
dtype: string
- name: start_date
dtype: string
- name: end_date
dtype: string
- name: minutes_link
dtype: string
- name: cleaned_name
dtype: string
- name: original_name
dtype: string
- name: sentence
dtype: string
splits:
- name: train
num_bytes: 169841774
num_examples: 380200
download_size: 31197106
dataset_size: 169841774
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
license: cc-by-nc-sa-4.0
task_categories:
- text-classification
language:
- en
tags:
- finance
- econ
pretty_name: WCB_380k_sentences
size_categories:
- 100K<n<1M
---
## Dataset Summary
For dataset summary, please refer to [https://huggingface.co/datasets/gtfintechlab/WCB_380k_sentences](https://huggingface.co/datasets/gtfintechlab/WCB_380k_sentences)
## Additional Information
This dataset contains all scraped sentences from 25 central banks. Each sentence contains metadata including temporal information and specific names from our [Github repository](https://github.com/gtfintechlab/WorldsCentralBanks). Additional information about this dataset is available in our [paper](https://arxiv.org).
### Label Interpretation
- **bank:** The central bank name.
- **year:** The year of the monetary policy meeting (or equivalent meeting).
- **doc_id:** Unique document identifier, which follows the naming convention: "{bank_name}_{meeting_date}".
- **release_date:** Date when the meeting minutes (or equivalent documents) were released.
- **start_date:** Start date of the meeting.
- **end_date:** End date of the meeting.
- **minutes_link:** Link to the original minutes (or equivalent) document.
- **cleaned_name:** Cleaned version of the meeting minutes (or equivalent) document, which follows the naming convention: "{bank_name}_{meeting_date}.txt"
- **original_name:** Original version of the meeting minutes (or equivalent) document, which follows the naming convention: "{bank_name}_{meeting_date}.{file_extension}"
- **sentence:** The sentence from the minutes (or equivalent) document.
## Licensing Information
The WCB_380k_sentences dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. [More information in the paper.](https://arxiv.org)
## Citation Information
```bibtex
@article{WCBShahSukhaniPardawala,
title={Words That Unite The World: A Unified Framework for Deciphering Global Central Bank Communications},
author={Agam Shah, Siddhant Sukhani, Huzaifa Pardawala et al.},
year={2025}
}
```
## Contact
For any WCB_380k_sentences dataset related issues and questions, please contact:
- Huzaifa Pardawala: huzaifahp7[at]gatech[dot]edu
- Siddhant Sukhani: ssukhani3[at]gatech[dot]edu
- Agam Shah: ashah482[at]gatech[dot]edu
## GitHub Link
[Link to our GitHub repository.](https://github.com/gtfintechlab/WorldsCentralBanks)
提供机构:
gtfintechlab



