susan0322/trump-truth-social
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/susan0322/trump-truth-social
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
language:
- en
tags:
- politics
- social-media
- truth-social
- nlp
- text-classification
- sentiment-analysis
- finance
- geopolitics
size_categories:
- 10K<n<100K
task_categories:
- text-classification
- text-generation
pretty_name: Trump Truth Social Posts Archive
dataset_info:
features:
- name: date
dtype: string
- name: time
dtype: string
- name: day_of_week
dtype: string
- name: datetime
dtype: string
- name: text
dtype: string
- name: content_html
dtype: string
- name: url
dtype: string
- name: cat_attacking_individual
dtype: float64
- name: cat_attacking_opposition
dtype: float64
- name: cat_deescalating
dtype: float64
- name: cat_enacting_aggressive
dtype: float64
- name: cat_enacting_nonaggressive
dtype: float64
- name: cat_other
dtype: float64
- name: cat_praising_endorsing
dtype: float64
- name: cat_self_promotion
dtype: float64
- name: cat_threatening_intl
dtype: float64
- name: post_id
dtype: string
- name: is_president
dtype: bool
- name: is_president_elect
dtype: bool
- name: replies_count
dtype: int64
- name: reblogs_count
dtype: int64
- name: favourites_count
dtype: int64
- name: media_urls
dtype: string
- name: links
dtype: string
- name: has_media
dtype: bool
- name: image_alt_text
dtype: string
- name: sp500_open
dtype: float64
- name: sp500_close
dtype: float64
- name: sp500_1hr_before
dtype: float64
- name: sp500_5min_before
dtype: float64
- name: sp500_at_post
dtype: float64
- name: sp500_5min_after
dtype: float64
- name: sp500_1hr_after
dtype: float64
- name: sp500_resolution
dtype: string
- name: dia_open
dtype: float64
- name: dia_close
dtype: float64
- name: dia_1hr_before
dtype: float64
- name: dia_5min_before
dtype: float64
- name: dia_at_post
dtype: float64
- name: dia_5min_after
dtype: float64
- name: dia_1hr_after
dtype: float64
- name: qqq_open
dtype: float64
- name: qqq_close
dtype: float64
- name: qqq_1hr_before
dtype: float64
- name: qqq_5min_before
dtype: float64
- name: qqq_at_post
dtype: float64
- name: qqq_5min_after
dtype: float64
- name: qqq_1hr_after
dtype: float64
- name: djt_open
dtype: float64
- name: djt_close
dtype: float64
- name: djt_1hr_before
dtype: float64
- name: djt_5min_before
dtype: float64
- name: djt_at_post
dtype: float64
- name: djt_5min_after
dtype: float64
- name: djt_1hr_after
dtype: float64
- name: lmt_open
dtype: float64
- name: lmt_close
dtype: float64
- name: lmt_1hr_before
dtype: float64
- name: lmt_5min_before
dtype: float64
- name: lmt_at_post
dtype: float64
- name: lmt_5min_after
dtype: float64
- name: lmt_1hr_after
dtype: float64
- name: war_open
dtype: float64
- name: war_close
dtype: float64
- name: war_1hr_before
dtype: float64
- name: war_5min_before
dtype: float64
- name: war_at_post
dtype: float64
- name: war_5min_after
dtype: float64
- name: war_1hr_after
dtype: float64
- name: cnrg_open
dtype: float64
- name: cnrg_close
dtype: float64
- name: cnrg_1hr_before
dtype: float64
- name: cnrg_5min_before
dtype: float64
- name: cnrg_at_post
dtype: float64
- name: cnrg_5min_after
dtype: float64
- name: cnrg_1hr_after
dtype: float64
- name: xlv_open
dtype: float64
- name: xlv_close
dtype: float64
- name: xlv_1hr_before
dtype: float64
- name: xlv_5min_before
dtype: float64
- name: xlv_at_post
dtype: float64
- name: xlv_5min_after
dtype: float64
- name: xlv_1hr_after
dtype: float64
- name: xph_open
dtype: float64
- name: xph_close
dtype: float64
- name: xph_1hr_before
dtype: float64
- name: xph_5min_before
dtype: float64
- name: xph_at_post
dtype: float64
- name: xph_5min_after
dtype: float64
- name: xph_1hr_after
dtype: float64
- name: gld_open
dtype: float64
- name: gld_close
dtype: float64
- name: gld_1hr_before
dtype: float64
- name: gld_5min_before
dtype: float64
- name: gld_at_post
dtype: float64
- name: gld_5min_after
dtype: float64
- name: gld_1hr_after
dtype: float64
- name: uso_open
dtype: float64
- name: uso_close
dtype: float64
- name: uso_1hr_before
dtype: float64
- name: uso_5min_before
dtype: float64
- name: uso_at_post
dtype: float64
- name: uso_5min_after
dtype: float64
- name: uso_1hr_after
dtype: float64
- name: xli_open
dtype: float64
- name: xli_close
dtype: float64
- name: xli_1hr_before
dtype: float64
- name: xli_5min_before
dtype: float64
- name: xli_at_post
dtype: float64
- name: xli_5min_after
dtype: float64
- name: xli_1hr_after
dtype: float64
- name: eww_open
dtype: float64
- name: eww_close
dtype: float64
- name: eww_1hr_before
dtype: float64
- name: eww_5min_before
dtype: float64
- name: eww_at_post
dtype: float64
- name: eww_5min_after
dtype: float64
- name: eww_1hr_after
dtype: float64
- name: vgk_open
dtype: float64
- name: vgk_close
dtype: float64
- name: vgk_1hr_before
dtype: float64
- name: vgk_5min_before
dtype: float64
- name: vgk_at_post
dtype: float64
- name: vgk_5min_after
dtype: float64
- name: vgk_1hr_after
dtype: float64
- name: ibit_open
dtype: float64
- name: ibit_close
dtype: float64
- name: ibit_1hr_before
dtype: float64
- name: ibit_5min_before
dtype: float64
- name: ibit_at_post
dtype: float64
- name: ibit_5min_after
dtype: float64
- name: ibit_1hr_after
dtype: float64
- name: fxi_open
dtype: float64
- name: fxi_close
dtype: float64
- name: fxi_1hr_before
dtype: float64
- name: fxi_5min_before
dtype: float64
- name: fxi_at_post
dtype: float64
- name: fxi_5min_after
dtype: float64
- name: fxi_1hr_after
dtype: float64
- name: tlt_open
dtype: float64
- name: tlt_close
dtype: float64
- name: tlt_1hr_before
dtype: float64
- name: tlt_5min_before
dtype: float64
- name: tlt_at_post
dtype: float64
- name: tlt_5min_after
dtype: float64
- name: tlt_1hr_after
dtype: float64
- name: uup_open
dtype: float64
- name: uup_close
dtype: float64
- name: uup_1hr_before
dtype: float64
- name: uup_5min_before
dtype: float64
- name: uup_at_post
dtype: float64
- name: uup_5min_after
dtype: float64
- name: uup_1hr_after
dtype: float64
- name: gdelt_military
dtype: float64
- name: gdelt_sanctions
dtype: float64
- name: gdelt_threat
dtype: float64
- name: gdelt_protest
dtype: float64
- name: gdelt_force_posture
dtype: float64
- name: gdelt_diplomatic
dtype: float64
- name: gdelt_material_conflict
dtype: float64
- name: gdelt_verbal_conflict
dtype: float64
- name: gdelt_material_cooperation
dtype: float64
- name: gdelt_verbal_cooperation
dtype: float64
- name: gdelt_goldstein_avg
dtype: float64
- name: gdelt_avg_tone
dtype: float64
- name: gdelt_total_events
dtype: float64
- name: gdelt_military_pct
dtype: float64
- name: gdelt_sanctions_pct
dtype: float64
- name: gdelt_threat_pct
dtype: float64
- name: gdelt_protest_pct
dtype: float64
- name: gdelt_force_posture_pct
dtype: float64
- name: gdelt_diplomatic_pct
dtype: float64
- name: gdelt_military_zscore
dtype: float64
- name: gdelt_sanctions_zscore
dtype: float64
- name: gdelt_threat_zscore
dtype: float64
- name: gdelt_protest_zscore
dtype: float64
- name: gdelt_material_conflict_zscore
dtype: float64
- name: gdelt_military_delta
dtype: float64
- name: gdelt_sanctions_delta
dtype: float64
- name: gdelt_threat_delta
dtype: float64
- name: gdelt_protest_delta
dtype: float64
- name: gdelt_material_conflict_delta
dtype: float64
- name: gdelt_goldstein_avg_delta
dtype: float64
- name: gdelt_avg_tone_delta
dtype: float64
- name: time_eastern
dtype: string
- name: during_market_hours
dtype: bool
- name: market_period
dtype: string
splits:
- name: train
num_bytes: 66826804
num_examples: 32429
download_size: 12214927
dataset_size: 66826804
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Trump Truth Social Posts Archive
Public posts ("Truths") by Donald J. Trump on Truth Social, enriched with market data, geopolitical event indicators, and LLM-based post classifications. Collected for academic research purposes.
## Dataset Description
- **Source**: [CNN/Stiles Truth Social Archive](https://github.com/stiles/trump-truth-social-archive) (live-updating public archive)
- **Posts**: ~32,000+ (growing)
- **Date range**: February 2022 – present
- **Update frequency**: Daily (Truth Social), weekly (all other sources)
- **Maintainer**: [Chris Soria](https://github.com/chrissoria) (UC Berkeley)
## Fields
### Post metadata
| Field | Type | Description |
|-------|------|-------------|
| `date` | string | Post date (YYYY-MM-DD) |
| `time` | string | Post time in UTC (HH:MM:SS) |
| `time_eastern` | string | Post time in US Eastern (HH:MM:SS, DST-aware) |
| `day_of_week` | string | Day name (Monday, Tuesday, etc.) |
| `datetime` | string | Full ISO 8601 timestamp (UTC) |
| `text` | string | Plain text content (HTML stripped) |
| `content_html` | string | Original HTML content |
| `url` | string | Direct link to post on Truth Social |
| `post_id` | string | Truth Social post ID |
| `is_president` | bool | Whether Trump was serving as president at time of post |
| `is_president_elect` | bool | Whether Trump was president-elect at time of post |
| `during_market_hours` | bool | Whether post was made during US market hours (9:30 AM – 4:00 PM ET, weekdays) |
| `market_period` | string | One of: `before_market`, `during_market`, `after_market` |
### Engagement
| Field | Type | Description |
|-------|------|-------------|
| `replies_count` | int | Number of replies |
| `reblogs_count` | int | Number of re-truths (reposts) |
| `favourites_count` | int | Number of likes |
### Media
| Field | Type | Description |
|-------|------|-------------|
| `media_urls` | string | Semicolon-separated image/video URLs attached to the post |
| `links` | string | Semicolon-separated URLs found in post text |
| `has_media` | bool | Whether post contains media attachments |
| `image_alt_text` | string | AI-generated factual image description for accessibility (in progress) |
### Post classification (5-model ensemble)
LLM-classified post categories using a 5-model unanimous-vote ensemble (Llama 4 Maverick, Qwen3-32B, Claude 3 Haiku, GPT-4o-mini, Gemini 2.0 Flash). Multi-label: a post can belong to multiple categories. Available for posts with text since Nov 5, 2024 (election day onwards). Values: 1 = present, 0 = not present.
| Field | Type | Description |
|-------|------|-------------|
| `cat_attacking_individual` | float | Targeting a specific person by name |
| `cat_attacking_opposition` | float | Targeting Democrats, a party, or political group broadly |
| `cat_threatening_intl` | float | Conditional threats, tariff warnings, military posturing |
| `cat_enacting_aggressive` | float | Imposing tariffs, sanctions, bans, military action (already done) |
| `cat_enacting_nonaggressive` | float | Signing bills, executive orders, domestic programs, appointments |
| `cat_deescalating` | float | Toning down, announcing deals, peace talks, ceasefire |
| `cat_praising_endorsing` | float | Positive statements about a person, leader, ally |
| `cat_self_promotion` | float | Boasting about achievements, economy, polls, ratings |
| `cat_other` | float | Does not fit any above category |
### Market data (18 tickers)
Each ticker has 7 columns following the pattern `{ticker}_{metric}`. Daily open/close prices are available for all posts. Intraday prices (1hr before through 1hr after) use the highest available resolution: 1-minute (last ~7 days), 5-minute (last ~60 days), or hourly (last ~2 years). Weekend/holiday posts use the most recent trading day. The `sp500_resolution` column indicates the intraday data resolution.
**Metrics per ticker:**
| Suffix | Description |
|--------|-------------|
| `_open` | Daily open price |
| `_close` | Daily close price |
| `_1hr_before` | Price 1 hour before the post |
| `_5min_before` | Price 5 minutes before the post |
| `_at_post` | Price at time of post |
| `_5min_after` | Price 5 minutes after the post |
| `_1hr_after` | Price 1 hour after the post |
**Tickers:**
| Prefix | Ticker | Name | Category |
|--------|--------|------|----------|
| `sp500_` | ^GSPC | S&P 500 | Broad market |
| `dia_` | DIA | SPDR Dow Jones Industrial Average ETF | Broad market |
| `qqq_` | QQQ | Invesco QQQ (Nasdaq-100) | Tech/growth |
| `djt_` | DJT | Trump Media & Technology Group | Trump-linked |
| `lmt_` | LMT | Lockheed Martin | Defense |
| `war_` | WAR | Themes US Military Academy ETF | Defense |
| `xli_` | XLI | Industrial Select Sector SPDR | Industrials |
| `xlv_` | XLV | Health Care Select Sector SPDR | Healthcare |
| `xph_` | XPH | SPDR S&P Pharmaceuticals ETF | Pharma |
| `cnrg_` | CNRG | SPDR S&P Kensho Clean Power ETF | Clean energy |
| `gld_` | GLD | SPDR Gold Shares | Gold/commodities |
| `uso_` | USO | United States Oil Fund | Oil/energy |
| `fxi_` | FXI | iShares China Large-Cap ETF | China/trade |
| `eww_` | EWW | iShares MSCI Mexico ETF | Mexico/trade |
| `vgk_` | VGK | Vanguard FTSE Europe ETF | Europe |
| `ibit_` | IBIT | iShares Bitcoin ETF | Crypto |
| `tlt_` | TLT | iShares 20+ Year Treasury Bond ETF | Bonds/rates |
| `uup_` | UUP | Invesco DB US Dollar Index | USD strength |
### GDELT geopolitical events (daily)
Daily aggregates of US-involved events from the [GDELT Project](https://www.gdeltproject.org/) via BigQuery. Each row gets the event counts for its post date. Based on CAMEO event coding of global news coverage.
**Note:** GDELT daily exports are typically available with a ~1 day lag. Posts from the most recent day may have null GDELT columns until the next daily update backfills them.
**Raw counts:**
| Field | Type | Description |
|-------|------|-------------|
| `gdelt_military` | int | US military assault/force/mass violence events (CAMEO 18-20) |
| `gdelt_sanctions` | int | Sanctions/embargo events (CAMEO 17) |
| `gdelt_threat` | int | Threat events (CAMEO 13) |
| `gdelt_protest` | int | Protest events (CAMEO 14) |
| `gdelt_force_posture` | int | Force posturing events (CAMEO 15) |
| `gdelt_diplomatic` | int | Diplomatic cooperation events (CAMEO 01-08) |
| `gdelt_material_conflict` | int | Material conflict events (QuadClass 4) |
| `gdelt_verbal_conflict` | int | Verbal conflict events (QuadClass 3) |
| `gdelt_material_cooperation` | int | Material cooperation events (QuadClass 2) |
| `gdelt_verbal_cooperation` | int | Verbal cooperation events (QuadClass 1) |
| `gdelt_goldstein_avg` | float | Average Goldstein scale for the day (-10 = max conflict, +10 = max cooperation) |
| `gdelt_avg_tone` | float | Average news tone for the day (negative = negative coverage) |
| `gdelt_total_events` | int | Total US-involved events |
**Derived:**
| Suffix | Description |
|--------|-------------|
| `_pct` | Share of total events (e.g., `gdelt_military_pct` = military events as % of total) |
| `_zscore` | Standard deviations above/below historical mean (flags unusual days) |
| `_delta` | Day-over-day change from previous day |
Available for: `military`, `sanctions`, `threat`, `protest`, `force_posture`, `diplomatic` (pct); `military`, `sanctions`, `threat`, `protest`, `material_conflict` (zscore and delta); `goldstein_avg`, `avg_tone` (delta).
## Intended Use
This dataset is intended for **academic research** in political science, computational social science, NLP, finance, and related fields. Example use cases:
- Analyzing the relationship between presidential social media activity and market movements
- Studying the timing and framing of aggressive policy announcements
- Discourse analysis and political communication research
- Event-driven analysis correlating posts with GDELT geopolitical indicators
- Accessibility research using AI-generated image descriptions
## Fair Use Notice
This dataset is compiled from publicly available posts by a public figure for academic research purposes under fair use (17 U.S.C. § 107). The data consists of factual records of public political speech. Source data is from the [CNN/Stiles public archive](https://github.com/stiles/trump-truth-social-archive). Market data sourced from Yahoo Finance via yfinance. Geopolitical data from the GDELT Project. Multiple peer-reviewed publications have established precedent for academic use of Truth Social data (see [ICWSM 2023](https://arxiv.org/abs/2303.11240), [arXiv:2411.01330](https://arxiv.org/abs/2411.01330)).
## Citation
If you use this dataset in your research, please cite this dataset and the underlying data sources:
### This dataset
```bibtex
@misc{soria2026trump_truth_social,
title={Trump Truth Social Posts Archive},
author={Soria, Christopher},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/datasets/chrissoria/trump-truth-social}
}
```
### Source data: Truth Social posts
The raw post data is sourced from Matt Stiles' CNN Truth Social archive:
```bibtex
@misc{stiles2024truthsocial,
title={Trump Truth Social Archive},
author={Stiles, Matt},
year={2024},
publisher={CNN},
url={https://github.com/stiles/trump-truth-social-archive}
}
```
### Market data: Yahoo Finance
Stock and ETF price data is sourced from Yahoo Finance via the [yfinance](https://github.com/ranaroussi/yfinance) Python library:
```bibtex
@software{yfinance,
title={yfinance: Download market data from Yahoo! Finance API},
author={Aroussi, Ran},
url={https://github.com/ranaroussi/yfinance},
license={Apache-2.0}
}
```
### Geopolitical events: GDELT Project
Daily geopolitical event aggregates are sourced from the [GDELT Project](https://www.gdeltproject.org/):
```bibtex
@article{leetaru2013gdelt,
title={GDELT: Global Data on Events, Location and Tone, 1979--2012},
author={Leetaru, Kalev and Schrodt, Philip A.},
journal={ISA Annual Convention},
year={2013},
url={https://www.gdeltproject.org/}
}
```
### LLM classification and image descriptions
Post classifications were generated using [cat-stack](https://github.com/chrissoria/cat-stack) with a 5-model ensemble (Llama 4 Maverick, Qwen3-32B, Claude 3 Haiku, GPT-4o-mini, Gemini 2.0 Flash). Image descriptions were generated using Qwen2.5-VL-72B.
```bibtex
@software{soria2026catstack,
title={cat-stack: Domain-agnostic text, image, and PDF classification engine powered by LLMs},
author={Soria, Christopher},
year={2026},
url={https://github.com/chrissoria/cat-stack}
}
```
## Part of the cat-pol ecosystem
This dataset is part of the [cat-pol](https://github.com/chrissoria/cat-pol) political text analysis toolkit. Install with:
```bash
pip install "cat-pol[sources]"
```
```python
from cat_pol.sources import fetch_trump_truths
df = fetch_trump_truths(since="2024-01-01")
```
提供机构:
susan0322



