five

Xiuze/Xstock

收藏
Hugging Face2026-04-04 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Xiuze/Xstock
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 task_categories: - text-classification - zero-shot-classification - table-question-answering - other language: - en tags: - finance - stock-prediction - social-media-analysis - b2b - explainable-ai - cot-reasoning - sec-filings pretty_name: Xstock size_categories: - 100K<n<1M --- # Xstock: A 15-Year Longitudinal Dataset Bridging Social Media and B2B Financials for Explainable Stock Forecasting [Paper Link (Coming Soon)] | [Project Repository](https://github.com/XiuzeZhou/Xstock/) <p align="center"> <a href="#english">English</a> | <a href="#chinese">中文说明</a> </p> --- <a name="english"></a> ## 🇬🇧 English Description ## 💎Xstock Dataset Xstock is a 15-year longitudinal dataset (2011–2025) designed to bridge the gap between B2B social media narratives and corporate financial performance. By merging **220,000 corporate tweets** with granular **SEC 10-K revenue segment filings**, Xstock provides a unique benchmark for explainable AI (XAI) in quantitative finance and business intelligence. <p align="center"> <img src="https://raw.githubusercontent.com/XiuzeZhou/Xstock/main/figures/xstock_log.png" width="200"> </p> ## 🌟Dataset Summary Unlike traditional sentiment datasets biased toward retail "hype," Xstock focuses on Business-to-Business (B2B) entities where stock performance is tied to institutional factors and official disclosures. The dataset reveals that B2B stock trends align closely with disclosure volume and fundamental revenue growth ($r=0.35$). ### Key Features - **Longitudinal Scope**: 15 years of continuous data (2011–2025) capturing the evolution of B2B social media strategies. - **Fiscally Grounded**: Each corporate tweet is mapped to specific revenue segments derived from 10-K filings, enabling "Segment-Led Attribution". - **Narrative Insights**: Identifies "**Strategic Narrative Buffering**"—the phenomenon where firms pivot toward non-business signaling (e.g., ESG, Thought Leadership) during fiscal downturns. - **XAI Benchmark**: Supports advanced reasoning paradigms like **Quantitative Chain-of-Thought (Q-CoT)** to mitigate optimism bias. ## 📂Dataset Structure ### Data Fields - `company_name`: The name of the B2B entity (e.g., Accenture, Microsoft, GE). - `fiscal_year`: The reporting period for the financial data. - `industry_type`: The sector classification of the firm. - `segment_list`: A breakdown of the company's core business units from 10-K filings. - `tweets`: A collection of annual corporate tweets associated with the firm. - `true_stock_label`: The actual annual stock price movement (Growth or Loss). - `report_period`: The specific date range for the observation. ## 💡Applications & Benchmarking ### Explainable Stock Forecasting Xstock facilitates a structured causal chain: Social Signals --> Segment Performance --> Stock Movement. ### Reasoning Paradigms 1). **Normal**: Direct sentiment-to-price mapping. 2). **Segment-Led CoT**: Injecting core revenue segments into the reasoning chain to improve "Loss" category recall by over 118%. 3). **Quantitative CoT (Q-CoT)**: Using weighted scoring and "de-hyping" heuristics (Rule A/B) to achieve a fourfold improvement in forecasting market losses. ## ⚖️License This dataset is licensed under CC BY-NC 4.0. Non-commercial research use only. ## 📝Citation If you use this dataset in your research, please cite our work: ``` ``` --- <a name="chinese"></a> ## 🇨🇳 中文说明 ## 💎 Xstock 数据集 Xstock 是一个跨度达 15 年(2011–2025)的纵向数据集,旨在弥补 B2B 企业社交媒体叙事与公司财务表现之间的研究空白。通过将 220,000 条企业推文 与精细化的 SEC 10-K 营收板块(Revenue Segment)报告 相结合,Xstock 为量化金融和商业智能领域的可解释人工智能(XAI)提供了一个独特的基准。 <p align="center"> <img src="https://raw.githubusercontent.com/XiuzeZhou/Xstock/main/figures/xstock_log.png" width="200"> </p> ## 🌟 数据集摘要 与传统侧重于散户“炒作(Hype)”的情感数据集不同,Xstock 专注于 B2B 实体,其股票表现与机构因素及官方披露密切相关。该数据集揭示了 B2B 股票趋势与披露量及基本面营收增长($r=0.35$)之间存在显著的一致性。 ### 核心特性 - **长跨度纵向观察**:涵盖 15 年(2011–2025)的连续数据,捕捉了 B2B 社交媒体策略的演变历程。 - **财务基本面锚定**:每条企业推文均映射到源自 10-K 报告的特定业务板块,实现了“板块导向型归因(Segment-Led Attribution)”。 - **叙事洞察**:识别了**“战略叙事缓冲(Strategic Narrative Buffering)”**现象——即公司在财政低迷时期会转向非业务类信号(如 ESG、思想领导力)的传播。 - **XAI 基准测试**:支持**量化思维链(Q-CoT)**等高级推理范式,有效缓解了社交媒体叙事中的乐观偏见。 ## 📂 数据集结构 ### 数据字段说明 - `company_name`: B2B 实体名称(如:Accenture, Microsoft, GE)。 - `fiscal_year`: 财务数据所属的财年。 - `industry_type`: 公司的行业分类。 - `segment_list`: 摘自 10-K 报告的公司核心业务板块明细。 - `tweets`: 与该公司相关的年度企业推文集合。 - `true_stock_label`: 真实的年度股价走势(Growth 增长 或 Loss 下跌)。 - `report_period`: 该样本观测的具体日期范围。 ## 💡 应用与基准测试 ### 可解释股票预测 Xstock 支持构建结构化的因果链:社交信号 --> 业务板块表现 --> 股票走势。 ### 推理范式 1). Normal(常规模式):直接从情感映射到价格走势的预测。 2). Segment-Led CoT(板块导向型 CoT):将核心营收板块引入推理链,使“Loss”类别的召回率提升超过 118%。 3). Quantitative CoT (Q-CoT,量化思维链):通过加权评分和“去干扰(De-hyping)”启发式规则(Rule A/B),在预测市场下跌方面实现了四倍的性能提升。 ## ⚖️ 许可证 本数据集采用 CC BY-NC 4.0 许可证。仅限非商业性研究使用。 ## 📝 引用 (Citation) 如果您在研究中使用了本数据集或代码,请引用我们的论文: ``` ```
提供机构:
Xiuze
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作