five

modestus/bitcoin_sentiment_analysis

收藏
Hugging Face2024-10-16 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/modestus/bitcoin_sentiment_analysis
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 dataset_info: features: - name: content dtype: string - name: metrics list: - name: label dtype: int64 - name: policy dtype: string - name: reasoning dtype: string splits: - name: train num_bytes: 21673740 num_examples: 4704 - name: test num_bytes: 9298292 num_examples: 2000 download_size: 14722722 dataset_size: 30972032 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* task_categories: - text-classification language: - en tags: - finance size_categories: - 10K<n<100K --- We introduce a specialized sentiment analysis dataset for decentralized finance, **DeFine**. The dataset contains *6.700* cryptocurrency-related news articles sourced from CoinMarketCap and TradingView, with sentiment labels generated by state-of-the-art Large Language Models. During its construction, we investigate the impact of chain-of-thought (CoT) prompting on LLM performance when processing complex financial texts. Our results reveal that CoT reasoning significantly outperforms simple and free-form prompting, particularly for smaller models, and offer key insights into how model size and architecture influence performance. <p align="center"> <img src="https://i.imgur.com/COwXKlE.jpeg" alt="consensus-rate-small" width="200" style="display: inline-block;"/> <img src="https://i.imgur.com/suNSLaO.jpeg" alt="consensus-rate-medium" width="200" style="display: inline-block;"/> <img src="https://i.imgur.com/NbDQ5tP.jpeg" alt="consensus-rate-large" width="200" style="display: inline-block;"/> </p> The training notebook can be found at [notebook](https://colab.research.google.com/drive/1HlA1Oiv660CTwxtK7hK3WZWTsNoDMgEG?usp=sharing) The evaluation results of LLMs on this dataset, along with our trained version: | Model | Consensus Rate | |----------------------------------------------|---------------| | Llama-3.1-8B-Instruct-Turbo | 0.9306 | | Gemma-2-27b-it | 0.9497 | | Llama-3.1-70B-Instruct-Turbo | 0.9593 | | Mixtral-8x22B-Instruct-v0.1 | 0.9480 | | Qwen2-72B-Instruct | 0.9517 | | Nous-Hermes-2-Mixtral-8x7B-DPO | 0.9201 | | Qwen2.5-3B-Instruct (Original) | 0.8947 | | Qwen2.5-3B-Instruct (Ours) | 0.9239 | | Qwen2.5-7B-Instruct (Original) | 0.8808 | | Qwen2.5-7B-Instruct (Ours) | 0.9421 | We hope that dataset and evaluation framework serve as valuable tools for advancing sentiment analysis research in DeFi!
提供机构:
modestus
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作