five

Kenpache/multilingual-financial-sentiment

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Kenpache/multilingual-financial-sentiment
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: - en - zh - ja - de - fr - es - ar tags: - finance - sentiment-analysis - multilingual - financial-news - text-classification size_categories: - 10K<n<100K task_categories: - text-classification task_ids: - sentiment-classification --- # Multilingual Financial Sentiment Dataset A curated dataset of **39,829** financial news sentences annotated with sentiment labels (Negative / Neutral / Positive) across **7 languages**, collected from **80+ financial news sources** worldwide. ## Dataset Summary | | | |---|---| | **Total samples** | 39,829 | | **Languages** | 7 (EN, ZH, JA, DE, FR, ES, AR) | | **Labels** | 3 (negative, neutral, positive) | | **Format** | CSV | | **Sources** | 80+ financial news outlets | ## Languages | Language | Code | Samples | % of Total | |---|---|---|---| | Japanese | ja | 8,287 | 20.8% | | Chinese | zh | 7,930 | 19.9% | | Spanish | es | 7,125 | 17.9% | | English | en | 6,887 | 17.3% | | German | de | 5,023 | 12.6% | | French | fr | 3,935 | 9.9% | | Arabic | ar | 642 | 1.6% | ## Label Distribution ### Overall | Label | Count | % | |---|---|---| | Neutral | 18,130 | 45.5% | | Positive | 12,257 | 30.8% | | Negative | 9,442 | 23.7% | ### Per Language | Language | Negative | Neutral | Positive | |---|---|---|---| | Japanese | 1,767 | 3,376 | 3,144 | | Chinese | 1,921 | 3,126 | 2,883 | | Spanish | 1,641 | 3,842 | 1,642 | | English | 1,704 | 3,339 | 1,844 | | German | 1,392 | 2,425 | 1,206 | | French | 870 | 1,657 | 1,408 | | Arabic | 147 | 365 | 130 | ## Sources Data was collected from major financial news outlets across all target languages: - **English:** CNBC, Yahoo Finance, Fortune, Bloomberg, Reuters, Barron's, Benzinga, Seeking Alpha, Kiplinger, Business Insider, Moneycontrol, Zacks, FT - **Chinese:** Sina Finance, EastMoney, 10jqka, NBD, China Securities, 163 Finance, Hexun, STCN - **Japanese:** Nikkan Kogyo, Nikkei, Reuters JP, Minkabu, Asahi Business, ZUU Online, Toyo Keizai, ITmedia Business, Sankei Economy - **German:** Börse.de, NTV Börse, FAZ Finanzen, Wallstreet Online, Börse Online, OnVista, Manager Magazin, Tagesschau, Handelsblatt, WiWo, Süddeutsche - **French:** Boursorama, Tradingsat, BFM Business, Le Revenu, L'Expansion, Capital, Le Figaro Bourse, L'AGEFI, EasyBourse - **Spanish:** Estrategias de Inversión, Expansión, El Confidencial, Cinco Días, Bloomberg Línea, Investing.es, Bolsamanía, EFE Economía, Infobae Economía, El Financiero, Portafolio, DF.cl - **Arabic:** Al Khaleej Economy, Al Jazeera Economy, Sabq Economy, RT Arabic Economy, Okaz Economy, Sky News Business, Maaal, Al Arabiya Economy, CNBC Arabia ## Data Format CSV with 4 columns: | Column | Type | Description | |---|---|---| | `sentence` | string | Financial news text | | `label` | string | Sentiment: `negative`, `neutral`, or `positive` | | `source` | string | News source identifier | | `language` | string | ISO 639-1 language code | ## Loading the Dataset ```python from datasets import load_dataset dataset = load_dataset("Kenpache/multilingual-financial-sentiment") df = dataset["train"].to_pandas() # Filter by language en_data = df[df["language"] == "en"] # Filter by label positive = df[df["label"] == "positive"] ``` ### Direct CSV Loading ```python import pandas as pd df = pd.read_csv("hf://datasets/Kenpache/multilingual-financial-sentiment/all_languages_clean.csv") print(df.head()) ``` ## Sample Data | sentence | label | source | language | |---|---|---|---| | Revenue surged 40% year-over-year, beating expectations. | positive | yahoo_finance | en | | Die Aktie verlor nach der Gewinnwarnung deutlich an Wert. | negative | faz_finanzen | de | | 同社の業績は前年並みで推移している。 | neutral | nikkei | ja | | Les résultats du groupe sont conformes aux attentes. | neutral | boursorama | fr | ## License & Usage This dataset is released **for academic and non-commercial research only** under fair use / text and data mining exceptions (EU DSM Directive Art. 3). Each sample is a single short sentence (≤1-2 sentences) extracted for sentiment classification research. Copyright of original texts remains with their respective publishers, cited in the `source` field. **For commercial use**, you must obtain licenses from original sources directly. If you are a rights holder and want content removed, open an issue or email alisterclrouli@gmail.com — we will remove within 48h. ## License Apache 2.0
提供机构:
Kenpache
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作