polibert/swik-sentiment-labels
收藏Hugging Face2026-03-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/polibert/swik-sentiment-labels
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
language:
- en
tags:
- financial-nlp
- sentiment-analysis
- aspect-based-sentiment-analysis
- financial-sentiment
- inversion-catalog
- commodities
- forex
- crypto
- finbert
pretty_name: swik Financial Sentiment Labels
size_categories:
- 10K<n<100K
---
# swik Financial Sentiment Labels
Asset-specific financial sentiment labels for 35+ securities — commodities, FX, indices, and crypto.
## What makes this different
Standard financial sentiment datasets assign generic polarity to headlines. This dataset applies **asset-specific inversion context** from the [swik inversion catalog](https://swik.io/inversions) — a community-maintained knowledge base of how phrases actually move prices for each specific asset.
Example: "OPEC cuts production" → generic NLP scores **negative**. For crude oil (OIL), swik scores it **bullish** — less supply means higher prices.
## Schema
| Column | Description |
|--------|-------------|
| `text` | Headline or news snippet |
| `security` | Asset symbol (OIL, GOLD, BTC, EURUSD, ...) |
| `security_name` | Full asset name |
| `category` | Asset category (energy, forex, crypto, ...) |
| `label` | bullish / bearish / neutral / irrelevant |
| `magnitude` | Signal strength 0–1 |
| `relevance` | How relevant headline is to this asset 0–1 |
| `confidence` | Model confidence 0–1 |
| `reasoning` | Haiku's explanation for the label |
| `source` | `human`, `ai_system` (swik Haiku), or `ai_finsenti` |
| `news_source` | Origin feed (gdelt, rss, etc.) |
| `date` | Publication date |
## Label sources
- **`human`** — community contributors via [swik.io/contribute/label](https://swik.io/contribute/label)
- **`ai_system`** — generated by Claude Haiku with the swik inversion catalog injected as context
- **`ai_finsenti`** — generated by FinSentiBot pipeline
AI labels are only as good as the inversion catalog that drives them. Without asset-specific context, ~30% of commodity and FX headlines would be mislabeled. Filter by `source == 'human'` for the highest-quality subset.
## Coverage
| Asset | Labels |
|-------|--------|
| OIL | ~20,000 |
| LNG | ~5,700 |
| BTC | ~5,400 |
| ETH | ~4,700 |
| EURUSD | ~4,500 |
| BRENT | ~4,100 |
| ... | ... |
35 assets total. Full list in the dataset.
## License
CC BY 4.0 — free to use, cite swik.
## Citation
```
@dataset{swik_sentiment_labels_2026,
title={swik Financial Sentiment Labels},
author={swik Community},
year={2026},
url={https://huggingface.co/datasets/polibert/swik-sentiment-labels},
license={CC BY 4.0}
}
```
## Links
- Platform: [swik.io](https://swik.io)
- Inversion catalog: [swik.io/inversions](https://swik.io/inversions)
- GitHub: [github.com/polibert/sentimentwiki-catalog](https://github.com/polibert/sentimentwiki-catalog)
- Telegram: [t.me/sentimentwiki](https://t.me/sentimentwiki)
提供机构:
polibert



