Kenpache/multilingual-financial-sentiment
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Kenpache/multilingual-financial-sentiment
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
- zh
- ja
- de
- fr
- es
- ar
tags:
- finance
- sentiment-analysis
- multilingual
- financial-news
- text-classification
size_categories:
- 10K<n<100K
task_categories:
- text-classification
task_ids:
- sentiment-classification
---
# Multilingual Financial Sentiment Dataset
A curated dataset of **39,829** financial news sentences annotated with sentiment labels (Negative / Neutral / Positive) across **7 languages**, collected from **80+ financial news sources** worldwide.
## Dataset Summary
| | |
|---|---|
| **Total samples** | 39,829 |
| **Languages** | 7 (EN, ZH, JA, DE, FR, ES, AR) |
| **Labels** | 3 (negative, neutral, positive) |
| **Format** | CSV |
| **Sources** | 80+ financial news outlets |
## Languages
| Language | Code | Samples | % of Total |
|---|---|---|---|
| Japanese | ja | 8,287 | 20.8% |
| Chinese | zh | 7,930 | 19.9% |
| Spanish | es | 7,125 | 17.9% |
| English | en | 6,887 | 17.3% |
| German | de | 5,023 | 12.6% |
| French | fr | 3,935 | 9.9% |
| Arabic | ar | 642 | 1.6% |
## Label Distribution
### Overall
| Label | Count | % |
|---|---|---|
| Neutral | 18,130 | 45.5% |
| Positive | 12,257 | 30.8% |
| Negative | 9,442 | 23.7% |
### Per Language
| Language | Negative | Neutral | Positive |
|---|---|---|---|
| Japanese | 1,767 | 3,376 | 3,144 |
| Chinese | 1,921 | 3,126 | 2,883 |
| Spanish | 1,641 | 3,842 | 1,642 |
| English | 1,704 | 3,339 | 1,844 |
| German | 1,392 | 2,425 | 1,206 |
| French | 870 | 1,657 | 1,408 |
| Arabic | 147 | 365 | 130 |
## Sources
Data was collected from major financial news outlets across all target languages:
- **English:** CNBC, Yahoo Finance, Fortune, Bloomberg, Reuters, Barron's, Benzinga, Seeking Alpha, Kiplinger, Business Insider, Moneycontrol, Zacks, FT
- **Chinese:** Sina Finance, EastMoney, 10jqka, NBD, China Securities, 163 Finance, Hexun, STCN
- **Japanese:** Nikkan Kogyo, Nikkei, Reuters JP, Minkabu, Asahi Business, ZUU Online, Toyo Keizai, ITmedia Business, Sankei Economy
- **German:** Börse.de, NTV Börse, FAZ Finanzen, Wallstreet Online, Börse Online, OnVista, Manager Magazin, Tagesschau, Handelsblatt, WiWo, Süddeutsche
- **French:** Boursorama, Tradingsat, BFM Business, Le Revenu, L'Expansion, Capital, Le Figaro Bourse, L'AGEFI, EasyBourse
- **Spanish:** Estrategias de Inversión, Expansión, El Confidencial, Cinco Días, Bloomberg Línea, Investing.es, Bolsamanía, EFE Economía, Infobae Economía, El Financiero, Portafolio, DF.cl
- **Arabic:** Al Khaleej Economy, Al Jazeera Economy, Sabq Economy, RT Arabic Economy, Okaz Economy, Sky News Business, Maaal, Al Arabiya Economy, CNBC Arabia
## Data Format
CSV with 4 columns:
| Column | Type | Description |
|---|---|---|
| `sentence` | string | Financial news text |
| `label` | string | Sentiment: `negative`, `neutral`, or `positive` |
| `source` | string | News source identifier |
| `language` | string | ISO 639-1 language code |
## Loading the Dataset
```python
from datasets import load_dataset
dataset = load_dataset("Kenpache/multilingual-financial-sentiment")
df = dataset["train"].to_pandas()
# Filter by language
en_data = df[df["language"] == "en"]
# Filter by label
positive = df[df["label"] == "positive"]
```
### Direct CSV Loading
```python
import pandas as pd
df = pd.read_csv("hf://datasets/Kenpache/multilingual-financial-sentiment/all_languages_clean.csv")
print(df.head())
```
## Sample Data
| sentence | label | source | language |
|---|---|---|---|
| Revenue surged 40% year-over-year, beating expectations. | positive | yahoo_finance | en |
| Die Aktie verlor nach der Gewinnwarnung deutlich an Wert. | negative | faz_finanzen | de |
| 同社の業績は前年並みで推移している。 | neutral | nikkei | ja |
| Les résultats du groupe sont conformes aux attentes. | neutral | boursorama | fr |
## License & Usage
This dataset is released **for academic and non-commercial research only**
under fair use / text and data mining exceptions (EU DSM Directive Art. 3).
Each sample is a single short sentence (≤1-2 sentences) extracted for
sentiment classification research. Copyright of original texts remains
with their respective publishers, cited in the `source` field.
**For commercial use**, you must obtain licenses from original sources directly.
If you are a rights holder and want content removed, open an issue or
email alisterclrouli@gmail.com — we will remove within 48h.
## License
Apache 2.0
提供机构:
Kenpache



