five

gtfintechlab/WCB_380k_sentences

收藏
Hugging Face2025-10-20 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/gtfintechlab/WCB_380k_sentences
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: bank dtype: string - name: year dtype: int64 - name: doc_id dtype: string - name: release_date dtype: string - name: start_date dtype: string - name: end_date dtype: string - name: minutes_link dtype: string - name: cleaned_name dtype: string - name: original_name dtype: string - name: sentence dtype: string splits: - name: train num_bytes: 169841774 num_examples: 380200 download_size: 31197106 dataset_size: 169841774 configs: - config_name: default data_files: - split: train path: data/train-* license: cc-by-nc-sa-4.0 task_categories: - text-classification language: - en tags: - finance - econ pretty_name: WCB_380k_sentences size_categories: - 100K<n<1M --- ## Dataset Summary For dataset summary, please refer to [https://huggingface.co/datasets/gtfintechlab/WCB_380k_sentences](https://huggingface.co/datasets/gtfintechlab/WCB_380k_sentences) ## Additional Information This dataset contains all scraped sentences from 25 central banks. Each sentence contains metadata including temporal information and specific names from our [Github repository](https://github.com/gtfintechlab/WorldsCentralBanks). Additional information about this dataset is available in our [paper](https://arxiv.org). ### Label Interpretation - **bank:** The central bank name. - **year:** The year of the monetary policy meeting (or equivalent meeting). - **doc_id:** Unique document identifier, which follows the naming convention: "{bank_name}_{meeting_date}". - **release_date:** Date when the meeting minutes (or equivalent documents) were released. - **start_date:** Start date of the meeting. - **end_date:** End date of the meeting. - **minutes_link:** Link to the original minutes (or equivalent) document. - **cleaned_name:** Cleaned version of the meeting minutes (or equivalent) document, which follows the naming convention: "{bank_name}_{meeting_date}.txt" - **original_name:** Original version of the meeting minutes (or equivalent) document, which follows the naming convention: "{bank_name}_{meeting_date}.{file_extension}" - **sentence:** The sentence from the minutes (or equivalent) document. ## Licensing Information The WCB_380k_sentences dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. [More information in the paper.](https://arxiv.org) ## Citation Information ```bibtex @article{WCBShahSukhaniPardawala, title={Words That Unite The World: A Unified Framework for Deciphering Global Central Bank Communications}, author={Agam Shah, Siddhant Sukhani, Huzaifa Pardawala et al.}, year={2025} } ``` ## Contact For any WCB_380k_sentences dataset related issues and questions, please contact: - Huzaifa Pardawala: huzaifahp7[at]gatech[dot]edu - Siddhant Sukhani: ssukhani3[at]gatech[dot]edu - Agam Shah: ashah482[at]gatech[dot]edu ## GitHub Link [Link to our GitHub repository.](https://github.com/gtfintechlab/WorldsCentralBanks)
提供机构:
gtfintechlab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作