U2VDow : Dow 30 Stocks tweets for proposing User2Vec approach
收藏Mendeley Data2021-02-17 更新2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/dc6gdcz7n9/1
下载链接
链接失效反馈官方服务:
资源简介:
This data set has been collected for "User2Vec: stock market prediction using deep learning with a novel representation of social network users" paper. Stock market prediction is an interesting and challenging problem for investors and financial analysts. Recently, recurrent neural networks like LSTM have shown good performance in the field of stock market prediction. Most current methods use historical market data and in some cases, the dominant direction of users and news for each day. In some cases, the opinions of social network members about the stocks are extracted to improve the prediction accuracy. Usually, the opinions of different users are treated in the same way and are given the same weights in these works. However, it is clear that these opinions have different values based on the accuracy of the prediction of the related user. In this study, the idea is to convert the opinion of each user about each stock into a vector (User2Vec) and then use these vectors to train a Recurrent Neural Network (RNN) and ultimately model the behavior of the users in the market. The proposed user representation is composed of the features extracted from the messages posted in a social network and the market data. Here, we consider the power of the user in predicting the future of the stock based on the social network metrics, e.g. the number of the followers of the user, and the accuracy of its previous predictions. This way, the number of training data is increased and the model is effectively learned. These data are then used to train a stacked bidirectional LSTM network used for aggregating the input data and providing the final prediction. Empirical studies of the proposed model on 30 stocks of 30 Dow Jones clearly shows the superiority of the proposed model over traditional representations. For example, the prediction accuracy is about 93% for the Apple stock which is much higher than the compared models.
本数据集专为论文《User2Vec:采用深度学习与社交网络用户新型表征的股市预测》采集。股市预测对于投资者与金融分析师而言,是兼具研究价值与实践挑战的重要课题。近年来,诸如长短期记忆网络(LSTM)这类循环神经网络(RNN)在股市预测领域展现出优异性能。当前多数预测方法仅采用历史市场数据,部分研究会额外纳入每日用户与新闻的主流倾向信息;另有部分研究通过提取社交网络用户对个股的观点,以提升预测精度。此类研究中,不同用户的观点通常被等同对待并赋予相同权重,但显而易见,不同用户的预测准确性存在差异,因此其观点的参考价值亦各不相同。本研究提出将每位用户对单只个股的观点转化为向量表征(User2Vec),随后利用这些向量训练循环神经网络(RNN),最终构建市场中用户行为的建模框架。所提出的用户表征融合了从社交网络帖子中提取的特征与市场数据两类信息。本研究基于社交网络指标(如用户粉丝数)与用户过往预测准确率,评估用户在个股未来走势预测中的影响力。通过该方式,训练数据规模得以扩充,模型的学习效果亦得到有效提升。本数据集将用于训练堆叠双向长短期记忆网络(Stacked Bidirectional LSTM),该网络用于聚合输入数据并输出最终预测结果。本研究在道琼斯30指数成分股的30只个股上开展实证测试,结果清晰表明,所提模型的性能优于传统表征方法。例如,针对苹果(Apple)个股的预测准确率可达约93%,远高于对比模型的表现。
创建时间:
2021-02-17



