five

polibert/swik-sentiment-labels

收藏
Hugging Face2026-03-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/polibert/swik-sentiment-labels
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 language: - en tags: - financial-nlp - sentiment-analysis - aspect-based-sentiment-analysis - financial-sentiment - inversion-catalog - commodities - forex - crypto - finbert pretty_name: swik Financial Sentiment Labels size_categories: - 10K<n<100K --- # swik Financial Sentiment Labels Asset-specific financial sentiment labels for 35+ securities — commodities, FX, indices, and crypto. ## What makes this different Standard financial sentiment datasets assign generic polarity to headlines. This dataset applies **asset-specific inversion context** from the [swik inversion catalog](https://swik.io/inversions) — a community-maintained knowledge base of how phrases actually move prices for each specific asset. Example: "OPEC cuts production" → generic NLP scores **negative**. For crude oil (OIL), swik scores it **bullish** — less supply means higher prices. ## Schema | Column | Description | |--------|-------------| | `text` | Headline or news snippet | | `security` | Asset symbol (OIL, GOLD, BTC, EURUSD, ...) | | `security_name` | Full asset name | | `category` | Asset category (energy, forex, crypto, ...) | | `label` | bullish / bearish / neutral / irrelevant | | `magnitude` | Signal strength 0–1 | | `relevance` | How relevant headline is to this asset 0–1 | | `confidence` | Model confidence 0–1 | | `reasoning` | Haiku's explanation for the label | | `source` | `human`, `ai_system` (swik Haiku), or `ai_finsenti` | | `news_source` | Origin feed (gdelt, rss, etc.) | | `date` | Publication date | ## Label sources - **`human`** — community contributors via [swik.io/contribute/label](https://swik.io/contribute/label) - **`ai_system`** — generated by Claude Haiku with the swik inversion catalog injected as context - **`ai_finsenti`** — generated by FinSentiBot pipeline AI labels are only as good as the inversion catalog that drives them. Without asset-specific context, ~30% of commodity and FX headlines would be mislabeled. Filter by `source == 'human'` for the highest-quality subset. ## Coverage | Asset | Labels | |-------|--------| | OIL | ~20,000 | | LNG | ~5,700 | | BTC | ~5,400 | | ETH | ~4,700 | | EURUSD | ~4,500 | | BRENT | ~4,100 | | ... | ... | 35 assets total. Full list in the dataset. ## License CC BY 4.0 — free to use, cite swik. ## Citation ``` @dataset{swik_sentiment_labels_2026, title={swik Financial Sentiment Labels}, author={swik Community}, year={2026}, url={https://huggingface.co/datasets/polibert/swik-sentiment-labels}, license={CC BY 4.0} } ``` ## Links - Platform: [swik.io](https://swik.io) - Inversion catalog: [swik.io/inversions](https://swik.io/inversions) - GitHub: [github.com/polibert/sentimentwiki-catalog](https://github.com/polibert/sentimentwiki-catalog) - Telegram: [t.me/sentimentwiki](https://t.me/sentimentwiki)
提供机构:
polibert
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作