five

susan0322/trump-truth-social

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/susan0322/trump-truth-social
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 language: - en tags: - politics - social-media - truth-social - nlp - text-classification - sentiment-analysis - finance - geopolitics size_categories: - 10K<n<100K task_categories: - text-classification - text-generation pretty_name: Trump Truth Social Posts Archive dataset_info: features: - name: date dtype: string - name: time dtype: string - name: day_of_week dtype: string - name: datetime dtype: string - name: text dtype: string - name: content_html dtype: string - name: url dtype: string - name: cat_attacking_individual dtype: float64 - name: cat_attacking_opposition dtype: float64 - name: cat_deescalating dtype: float64 - name: cat_enacting_aggressive dtype: float64 - name: cat_enacting_nonaggressive dtype: float64 - name: cat_other dtype: float64 - name: cat_praising_endorsing dtype: float64 - name: cat_self_promotion dtype: float64 - name: cat_threatening_intl dtype: float64 - name: post_id dtype: string - name: is_president dtype: bool - name: is_president_elect dtype: bool - name: replies_count dtype: int64 - name: reblogs_count dtype: int64 - name: favourites_count dtype: int64 - name: media_urls dtype: string - name: links dtype: string - name: has_media dtype: bool - name: image_alt_text dtype: string - name: sp500_open dtype: float64 - name: sp500_close dtype: float64 - name: sp500_1hr_before dtype: float64 - name: sp500_5min_before dtype: float64 - name: sp500_at_post dtype: float64 - name: sp500_5min_after dtype: float64 - name: sp500_1hr_after dtype: float64 - name: sp500_resolution dtype: string - name: dia_open dtype: float64 - name: dia_close dtype: float64 - name: dia_1hr_before dtype: float64 - name: dia_5min_before dtype: float64 - name: dia_at_post dtype: float64 - name: dia_5min_after dtype: float64 - name: dia_1hr_after dtype: float64 - name: qqq_open dtype: float64 - name: qqq_close dtype: float64 - name: qqq_1hr_before dtype: float64 - name: qqq_5min_before dtype: float64 - name: qqq_at_post dtype: float64 - name: qqq_5min_after dtype: float64 - name: qqq_1hr_after dtype: float64 - name: djt_open dtype: float64 - name: djt_close dtype: float64 - name: djt_1hr_before dtype: float64 - name: djt_5min_before dtype: float64 - name: djt_at_post dtype: float64 - name: djt_5min_after dtype: float64 - name: djt_1hr_after dtype: float64 - name: lmt_open dtype: float64 - name: lmt_close dtype: float64 - name: lmt_1hr_before dtype: float64 - name: lmt_5min_before dtype: float64 - name: lmt_at_post dtype: float64 - name: lmt_5min_after dtype: float64 - name: lmt_1hr_after dtype: float64 - name: war_open dtype: float64 - name: war_close dtype: float64 - name: war_1hr_before dtype: float64 - name: war_5min_before dtype: float64 - name: war_at_post dtype: float64 - name: war_5min_after dtype: float64 - name: war_1hr_after dtype: float64 - name: cnrg_open dtype: float64 - name: cnrg_close dtype: float64 - name: cnrg_1hr_before dtype: float64 - name: cnrg_5min_before dtype: float64 - name: cnrg_at_post dtype: float64 - name: cnrg_5min_after dtype: float64 - name: cnrg_1hr_after dtype: float64 - name: xlv_open dtype: float64 - name: xlv_close dtype: float64 - name: xlv_1hr_before dtype: float64 - name: xlv_5min_before dtype: float64 - name: xlv_at_post dtype: float64 - name: xlv_5min_after dtype: float64 - name: xlv_1hr_after dtype: float64 - name: xph_open dtype: float64 - name: xph_close dtype: float64 - name: xph_1hr_before dtype: float64 - name: xph_5min_before dtype: float64 - name: xph_at_post dtype: float64 - name: xph_5min_after dtype: float64 - name: xph_1hr_after dtype: float64 - name: gld_open dtype: float64 - name: gld_close dtype: float64 - name: gld_1hr_before dtype: float64 - name: gld_5min_before dtype: float64 - name: gld_at_post dtype: float64 - name: gld_5min_after dtype: float64 - name: gld_1hr_after dtype: float64 - name: uso_open dtype: float64 - name: uso_close dtype: float64 - name: uso_1hr_before dtype: float64 - name: uso_5min_before dtype: float64 - name: uso_at_post dtype: float64 - name: uso_5min_after dtype: float64 - name: uso_1hr_after dtype: float64 - name: xli_open dtype: float64 - name: xli_close dtype: float64 - name: xli_1hr_before dtype: float64 - name: xli_5min_before dtype: float64 - name: xli_at_post dtype: float64 - name: xli_5min_after dtype: float64 - name: xli_1hr_after dtype: float64 - name: eww_open dtype: float64 - name: eww_close dtype: float64 - name: eww_1hr_before dtype: float64 - name: eww_5min_before dtype: float64 - name: eww_at_post dtype: float64 - name: eww_5min_after dtype: float64 - name: eww_1hr_after dtype: float64 - name: vgk_open dtype: float64 - name: vgk_close dtype: float64 - name: vgk_1hr_before dtype: float64 - name: vgk_5min_before dtype: float64 - name: vgk_at_post dtype: float64 - name: vgk_5min_after dtype: float64 - name: vgk_1hr_after dtype: float64 - name: ibit_open dtype: float64 - name: ibit_close dtype: float64 - name: ibit_1hr_before dtype: float64 - name: ibit_5min_before dtype: float64 - name: ibit_at_post dtype: float64 - name: ibit_5min_after dtype: float64 - name: ibit_1hr_after dtype: float64 - name: fxi_open dtype: float64 - name: fxi_close dtype: float64 - name: fxi_1hr_before dtype: float64 - name: fxi_5min_before dtype: float64 - name: fxi_at_post dtype: float64 - name: fxi_5min_after dtype: float64 - name: fxi_1hr_after dtype: float64 - name: tlt_open dtype: float64 - name: tlt_close dtype: float64 - name: tlt_1hr_before dtype: float64 - name: tlt_5min_before dtype: float64 - name: tlt_at_post dtype: float64 - name: tlt_5min_after dtype: float64 - name: tlt_1hr_after dtype: float64 - name: uup_open dtype: float64 - name: uup_close dtype: float64 - name: uup_1hr_before dtype: float64 - name: uup_5min_before dtype: float64 - name: uup_at_post dtype: float64 - name: uup_5min_after dtype: float64 - name: uup_1hr_after dtype: float64 - name: gdelt_military dtype: float64 - name: gdelt_sanctions dtype: float64 - name: gdelt_threat dtype: float64 - name: gdelt_protest dtype: float64 - name: gdelt_force_posture dtype: float64 - name: gdelt_diplomatic dtype: float64 - name: gdelt_material_conflict dtype: float64 - name: gdelt_verbal_conflict dtype: float64 - name: gdelt_material_cooperation dtype: float64 - name: gdelt_verbal_cooperation dtype: float64 - name: gdelt_goldstein_avg dtype: float64 - name: gdelt_avg_tone dtype: float64 - name: gdelt_total_events dtype: float64 - name: gdelt_military_pct dtype: float64 - name: gdelt_sanctions_pct dtype: float64 - name: gdelt_threat_pct dtype: float64 - name: gdelt_protest_pct dtype: float64 - name: gdelt_force_posture_pct dtype: float64 - name: gdelt_diplomatic_pct dtype: float64 - name: gdelt_military_zscore dtype: float64 - name: gdelt_sanctions_zscore dtype: float64 - name: gdelt_threat_zscore dtype: float64 - name: gdelt_protest_zscore dtype: float64 - name: gdelt_material_conflict_zscore dtype: float64 - name: gdelt_military_delta dtype: float64 - name: gdelt_sanctions_delta dtype: float64 - name: gdelt_threat_delta dtype: float64 - name: gdelt_protest_delta dtype: float64 - name: gdelt_material_conflict_delta dtype: float64 - name: gdelt_goldstein_avg_delta dtype: float64 - name: gdelt_avg_tone_delta dtype: float64 - name: time_eastern dtype: string - name: during_market_hours dtype: bool - name: market_period dtype: string splits: - name: train num_bytes: 66826804 num_examples: 32429 download_size: 12214927 dataset_size: 66826804 configs: - config_name: default data_files: - split: train path: data/train-* --- # Trump Truth Social Posts Archive Public posts ("Truths") by Donald J. Trump on Truth Social, enriched with market data, geopolitical event indicators, and LLM-based post classifications. Collected for academic research purposes. ## Dataset Description - **Source**: [CNN/Stiles Truth Social Archive](https://github.com/stiles/trump-truth-social-archive) (live-updating public archive) - **Posts**: ~32,000+ (growing) - **Date range**: February 2022 – present - **Update frequency**: Daily (Truth Social), weekly (all other sources) - **Maintainer**: [Chris Soria](https://github.com/chrissoria) (UC Berkeley) ## Fields ### Post metadata | Field | Type | Description | |-------|------|-------------| | `date` | string | Post date (YYYY-MM-DD) | | `time` | string | Post time in UTC (HH:MM:SS) | | `time_eastern` | string | Post time in US Eastern (HH:MM:SS, DST-aware) | | `day_of_week` | string | Day name (Monday, Tuesday, etc.) | | `datetime` | string | Full ISO 8601 timestamp (UTC) | | `text` | string | Plain text content (HTML stripped) | | `content_html` | string | Original HTML content | | `url` | string | Direct link to post on Truth Social | | `post_id` | string | Truth Social post ID | | `is_president` | bool | Whether Trump was serving as president at time of post | | `is_president_elect` | bool | Whether Trump was president-elect at time of post | | `during_market_hours` | bool | Whether post was made during US market hours (9:30 AM – 4:00 PM ET, weekdays) | | `market_period` | string | One of: `before_market`, `during_market`, `after_market` | ### Engagement | Field | Type | Description | |-------|------|-------------| | `replies_count` | int | Number of replies | | `reblogs_count` | int | Number of re-truths (reposts) | | `favourites_count` | int | Number of likes | ### Media | Field | Type | Description | |-------|------|-------------| | `media_urls` | string | Semicolon-separated image/video URLs attached to the post | | `links` | string | Semicolon-separated URLs found in post text | | `has_media` | bool | Whether post contains media attachments | | `image_alt_text` | string | AI-generated factual image description for accessibility (in progress) | ### Post classification (5-model ensemble) LLM-classified post categories using a 5-model unanimous-vote ensemble (Llama 4 Maverick, Qwen3-32B, Claude 3 Haiku, GPT-4o-mini, Gemini 2.0 Flash). Multi-label: a post can belong to multiple categories. Available for posts with text since Nov 5, 2024 (election day onwards). Values: 1 = present, 0 = not present. | Field | Type | Description | |-------|------|-------------| | `cat_attacking_individual` | float | Targeting a specific person by name | | `cat_attacking_opposition` | float | Targeting Democrats, a party, or political group broadly | | `cat_threatening_intl` | float | Conditional threats, tariff warnings, military posturing | | `cat_enacting_aggressive` | float | Imposing tariffs, sanctions, bans, military action (already done) | | `cat_enacting_nonaggressive` | float | Signing bills, executive orders, domestic programs, appointments | | `cat_deescalating` | float | Toning down, announcing deals, peace talks, ceasefire | | `cat_praising_endorsing` | float | Positive statements about a person, leader, ally | | `cat_self_promotion` | float | Boasting about achievements, economy, polls, ratings | | `cat_other` | float | Does not fit any above category | ### Market data (18 tickers) Each ticker has 7 columns following the pattern `{ticker}_{metric}`. Daily open/close prices are available for all posts. Intraday prices (1hr before through 1hr after) use the highest available resolution: 1-minute (last ~7 days), 5-minute (last ~60 days), or hourly (last ~2 years). Weekend/holiday posts use the most recent trading day. The `sp500_resolution` column indicates the intraday data resolution. **Metrics per ticker:** | Suffix | Description | |--------|-------------| | `_open` | Daily open price | | `_close` | Daily close price | | `_1hr_before` | Price 1 hour before the post | | `_5min_before` | Price 5 minutes before the post | | `_at_post` | Price at time of post | | `_5min_after` | Price 5 minutes after the post | | `_1hr_after` | Price 1 hour after the post | **Tickers:** | Prefix | Ticker | Name | Category | |--------|--------|------|----------| | `sp500_` | ^GSPC | S&P 500 | Broad market | | `dia_` | DIA | SPDR Dow Jones Industrial Average ETF | Broad market | | `qqq_` | QQQ | Invesco QQQ (Nasdaq-100) | Tech/growth | | `djt_` | DJT | Trump Media & Technology Group | Trump-linked | | `lmt_` | LMT | Lockheed Martin | Defense | | `war_` | WAR | Themes US Military Academy ETF | Defense | | `xli_` | XLI | Industrial Select Sector SPDR | Industrials | | `xlv_` | XLV | Health Care Select Sector SPDR | Healthcare | | `xph_` | XPH | SPDR S&P Pharmaceuticals ETF | Pharma | | `cnrg_` | CNRG | SPDR S&P Kensho Clean Power ETF | Clean energy | | `gld_` | GLD | SPDR Gold Shares | Gold/commodities | | `uso_` | USO | United States Oil Fund | Oil/energy | | `fxi_` | FXI | iShares China Large-Cap ETF | China/trade | | `eww_` | EWW | iShares MSCI Mexico ETF | Mexico/trade | | `vgk_` | VGK | Vanguard FTSE Europe ETF | Europe | | `ibit_` | IBIT | iShares Bitcoin ETF | Crypto | | `tlt_` | TLT | iShares 20+ Year Treasury Bond ETF | Bonds/rates | | `uup_` | UUP | Invesco DB US Dollar Index | USD strength | ### GDELT geopolitical events (daily) Daily aggregates of US-involved events from the [GDELT Project](https://www.gdeltproject.org/) via BigQuery. Each row gets the event counts for its post date. Based on CAMEO event coding of global news coverage. **Note:** GDELT daily exports are typically available with a ~1 day lag. Posts from the most recent day may have null GDELT columns until the next daily update backfills them. **Raw counts:** | Field | Type | Description | |-------|------|-------------| | `gdelt_military` | int | US military assault/force/mass violence events (CAMEO 18-20) | | `gdelt_sanctions` | int | Sanctions/embargo events (CAMEO 17) | | `gdelt_threat` | int | Threat events (CAMEO 13) | | `gdelt_protest` | int | Protest events (CAMEO 14) | | `gdelt_force_posture` | int | Force posturing events (CAMEO 15) | | `gdelt_diplomatic` | int | Diplomatic cooperation events (CAMEO 01-08) | | `gdelt_material_conflict` | int | Material conflict events (QuadClass 4) | | `gdelt_verbal_conflict` | int | Verbal conflict events (QuadClass 3) | | `gdelt_material_cooperation` | int | Material cooperation events (QuadClass 2) | | `gdelt_verbal_cooperation` | int | Verbal cooperation events (QuadClass 1) | | `gdelt_goldstein_avg` | float | Average Goldstein scale for the day (-10 = max conflict, +10 = max cooperation) | | `gdelt_avg_tone` | float | Average news tone for the day (negative = negative coverage) | | `gdelt_total_events` | int | Total US-involved events | **Derived:** | Suffix | Description | |--------|-------------| | `_pct` | Share of total events (e.g., `gdelt_military_pct` = military events as % of total) | | `_zscore` | Standard deviations above/below historical mean (flags unusual days) | | `_delta` | Day-over-day change from previous day | Available for: `military`, `sanctions`, `threat`, `protest`, `force_posture`, `diplomatic` (pct); `military`, `sanctions`, `threat`, `protest`, `material_conflict` (zscore and delta); `goldstein_avg`, `avg_tone` (delta). ## Intended Use This dataset is intended for **academic research** in political science, computational social science, NLP, finance, and related fields. Example use cases: - Analyzing the relationship between presidential social media activity and market movements - Studying the timing and framing of aggressive policy announcements - Discourse analysis and political communication research - Event-driven analysis correlating posts with GDELT geopolitical indicators - Accessibility research using AI-generated image descriptions ## Fair Use Notice This dataset is compiled from publicly available posts by a public figure for academic research purposes under fair use (17 U.S.C. § 107). The data consists of factual records of public political speech. Source data is from the [CNN/Stiles public archive](https://github.com/stiles/trump-truth-social-archive). Market data sourced from Yahoo Finance via yfinance. Geopolitical data from the GDELT Project. Multiple peer-reviewed publications have established precedent for academic use of Truth Social data (see [ICWSM 2023](https://arxiv.org/abs/2303.11240), [arXiv:2411.01330](https://arxiv.org/abs/2411.01330)). ## Citation If you use this dataset in your research, please cite this dataset and the underlying data sources: ### This dataset ```bibtex @misc{soria2026trump_truth_social, title={Trump Truth Social Posts Archive}, author={Soria, Christopher}, year={2026}, publisher={HuggingFace}, url={https://huggingface.co/datasets/chrissoria/trump-truth-social} } ``` ### Source data: Truth Social posts The raw post data is sourced from Matt Stiles' CNN Truth Social archive: ```bibtex @misc{stiles2024truthsocial, title={Trump Truth Social Archive}, author={Stiles, Matt}, year={2024}, publisher={CNN}, url={https://github.com/stiles/trump-truth-social-archive} } ``` ### Market data: Yahoo Finance Stock and ETF price data is sourced from Yahoo Finance via the [yfinance](https://github.com/ranaroussi/yfinance) Python library: ```bibtex @software{yfinance, title={yfinance: Download market data from Yahoo! Finance API}, author={Aroussi, Ran}, url={https://github.com/ranaroussi/yfinance}, license={Apache-2.0} } ``` ### Geopolitical events: GDELT Project Daily geopolitical event aggregates are sourced from the [GDELT Project](https://www.gdeltproject.org/): ```bibtex @article{leetaru2013gdelt, title={GDELT: Global Data on Events, Location and Tone, 1979--2012}, author={Leetaru, Kalev and Schrodt, Philip A.}, journal={ISA Annual Convention}, year={2013}, url={https://www.gdeltproject.org/} } ``` ### LLM classification and image descriptions Post classifications were generated using [cat-stack](https://github.com/chrissoria/cat-stack) with a 5-model ensemble (Llama 4 Maverick, Qwen3-32B, Claude 3 Haiku, GPT-4o-mini, Gemini 2.0 Flash). Image descriptions were generated using Qwen2.5-VL-72B. ```bibtex @software{soria2026catstack, title={cat-stack: Domain-agnostic text, image, and PDF classification engine powered by LLMs}, author={Soria, Christopher}, year={2026}, url={https://github.com/chrissoria/cat-stack} } ``` ## Part of the cat-pol ecosystem This dataset is part of the [cat-pol](https://github.com/chrissoria/cat-pol) political text analysis toolkit. Install with: ```bash pip install "cat-pol[sources]" ``` ```python from cat_pol.sources import fetch_trump_truths df = fetch_trump_truths(since="2024-01-01") ```
提供机构:
susan0322
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作