A Tweet-based Dataset for Company-Level Stock Return Prediction
收藏arXiv2020-06-17 更新2024-06-21 收录
下载链接:
https://github.com/ImperialNLP/stockreturnpred
下载链接
链接失效反馈官方服务:
资源简介:
本数据集由帝国理工学院计算系创建,名为‘A Tweet-based Dataset for Company-Level Stock Return Prediction’,包含862,231条英文推文,旨在通过社交媒体文本预测公司股票的一至七日回报。数据集通过筛选和清洗,提供了一个包含85,176条标记实例的清洁子集。创建过程中,研究人员收集了提及全球品牌排名前100的公司的推文,并结合股票回报信息进行标注。该数据集适用于构建利用文本信息预测股票价格短期变动的模型,特别适合长期基本面投资者。
This dataset, titled *A Tweet-based Dataset for Company-Level Stock Return Prediction*, was developed by the Department of Computing, Imperial College London. It contains 862,231 English tweets, and is designed to predict 1-to-7-day stock returns of companies using social media textual data. After screening and cleaning processes, the dataset provides a clean subset comprising 85,176 labeled instances. During its creation, researchers collected tweets that mention companies ranked among the top 100 global brands, and annotated the dataset with corresponding stock return information. This dataset is applicable for constructing models that leverage textual information to forecast short-term stock price fluctuations, and is particularly suitable for long-term fundamental investors.
提供机构:
帝国理工学院计算系
创建时间:
2020-06-17



