Xiuze/Xstock
收藏Hugging Face2026-04-04 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Xiuze/Xstock
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
task_categories:
- text-classification
- zero-shot-classification
- table-question-answering
- other
language:
- en
tags:
- finance
- stock-prediction
- social-media-analysis
- b2b
- explainable-ai
- cot-reasoning
- sec-filings
pretty_name: Xstock
size_categories:
- 100K<n<1M
---
# Xstock: A 15-Year Longitudinal Dataset Bridging Social Media and B2B Financials for Explainable Stock Forecasting
[Paper Link (Coming Soon)] | [Project Repository](https://github.com/XiuzeZhou/Xstock/)
<p align="center">
<a href="#english">English</a> | <a href="#chinese">中文说明</a>
</p>
---
<a name="english"></a>
## 🇬🇧 English Description
## 💎Xstock Dataset
Xstock is a 15-year longitudinal dataset (2011–2025) designed to bridge the gap between B2B social media narratives and corporate financial performance. By merging **220,000 corporate tweets** with granular **SEC 10-K revenue segment filings**, Xstock provides a unique benchmark for explainable AI (XAI) in quantitative finance and business intelligence.
<p align="center">
<img src="https://raw.githubusercontent.com/XiuzeZhou/Xstock/main/figures/xstock_log.png" width="200">
</p>
## 🌟Dataset Summary
Unlike traditional sentiment datasets biased toward retail "hype," Xstock focuses on Business-to-Business (B2B) entities where stock performance is tied to institutional factors and official disclosures. The dataset reveals that B2B stock trends align closely with disclosure volume and fundamental revenue growth ($r=0.35$).
### Key Features
- **Longitudinal Scope**: 15 years of continuous data (2011–2025) capturing the evolution of B2B social media strategies.
- **Fiscally Grounded**: Each corporate tweet is mapped to specific revenue segments derived from 10-K filings, enabling "Segment-Led Attribution".
- **Narrative Insights**: Identifies "**Strategic Narrative Buffering**"—the phenomenon where firms pivot toward non-business signaling (e.g., ESG, Thought Leadership) during fiscal downturns.
- **XAI Benchmark**: Supports advanced reasoning paradigms like **Quantitative Chain-of-Thought (Q-CoT)** to mitigate optimism bias.
## 📂Dataset Structure
### Data Fields
- `company_name`: The name of the B2B entity (e.g., Accenture, Microsoft, GE).
- `fiscal_year`: The reporting period for the financial data.
- `industry_type`: The sector classification of the firm.
- `segment_list`: A breakdown of the company's core business units from 10-K filings.
- `tweets`: A collection of annual corporate tweets associated with the firm.
- `true_stock_label`: The actual annual stock price movement (Growth or Loss).
- `report_period`: The specific date range for the observation.
## 💡Applications & Benchmarking
### Explainable Stock Forecasting
Xstock facilitates a structured causal chain: Social Signals --> Segment Performance --> Stock Movement.
### Reasoning Paradigms
1). **Normal**: Direct sentiment-to-price mapping.
2). **Segment-Led CoT**: Injecting core revenue segments into the reasoning chain to improve "Loss" category recall by over 118%.
3). **Quantitative CoT (Q-CoT)**: Using weighted scoring and "de-hyping" heuristics (Rule A/B) to achieve a fourfold improvement in forecasting market losses.
## ⚖️License
This dataset is licensed under CC BY-NC 4.0. Non-commercial research use only.
## 📝Citation
If you use this dataset in your research, please cite our work:
```
```
---
<a name="chinese"></a>
## 🇨🇳 中文说明
## 💎 Xstock 数据集
Xstock 是一个跨度达 15 年(2011–2025)的纵向数据集,旨在弥补 B2B 企业社交媒体叙事与公司财务表现之间的研究空白。通过将 220,000 条企业推文 与精细化的 SEC 10-K 营收板块(Revenue Segment)报告 相结合,Xstock 为量化金融和商业智能领域的可解释人工智能(XAI)提供了一个独特的基准。
<p align="center">
<img src="https://raw.githubusercontent.com/XiuzeZhou/Xstock/main/figures/xstock_log.png" width="200">
</p>
## 🌟 数据集摘要
与传统侧重于散户“炒作(Hype)”的情感数据集不同,Xstock 专注于 B2B 实体,其股票表现与机构因素及官方披露密切相关。该数据集揭示了 B2B 股票趋势与披露量及基本面营收增长($r=0.35$)之间存在显著的一致性。
### 核心特性
- **长跨度纵向观察**:涵盖 15 年(2011–2025)的连续数据,捕捉了 B2B 社交媒体策略的演变历程。
- **财务基本面锚定**:每条企业推文均映射到源自 10-K 报告的特定业务板块,实现了“板块导向型归因(Segment-Led Attribution)”。
- **叙事洞察**:识别了**“战略叙事缓冲(Strategic Narrative Buffering)”**现象——即公司在财政低迷时期会转向非业务类信号(如 ESG、思想领导力)的传播。
- **XAI 基准测试**:支持**量化思维链(Q-CoT)**等高级推理范式,有效缓解了社交媒体叙事中的乐观偏见。
## 📂 数据集结构
### 数据字段说明
- `company_name`: B2B 实体名称(如:Accenture, Microsoft, GE)。
- `fiscal_year`: 财务数据所属的财年。
- `industry_type`: 公司的行业分类。
- `segment_list`: 摘自 10-K 报告的公司核心业务板块明细。
- `tweets`: 与该公司相关的年度企业推文集合。
- `true_stock_label`: 真实的年度股价走势(Growth 增长 或 Loss 下跌)。
- `report_period`: 该样本观测的具体日期范围。
## 💡 应用与基准测试
### 可解释股票预测
Xstock 支持构建结构化的因果链:社交信号 --> 业务板块表现 --> 股票走势。
### 推理范式
1). Normal(常规模式):直接从情感映射到价格走势的预测。
2). Segment-Led CoT(板块导向型 CoT):将核心营收板块引入推理链,使“Loss”类别的召回率提升超过 118%。
3). Quantitative CoT (Q-CoT,量化思维链):通过加权评分和“去干扰(De-hyping)”启发式规则(Rule A/B),在预测市场下跌方面实现了四倍的性能提升。
## ⚖️ 许可证
本数据集采用 CC BY-NC 4.0 许可证。仅限非商业性研究使用。
## 📝 引用 (Citation)
如果您在研究中使用了本数据集或代码,请引用我们的论文:
```
```
提供机构:
Xiuze



