five

Applying machine learning to study correlations, if any, between news content and stock price movements

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://doi.org/10.7910/DVN/HUK9TF
下载链接
链接失效反馈
官方服务:
资源简介:
Text classification problems are quite successfully solved by current machine learning techniques. Text content such as consumer reviews, email content etc. can be classified as favorable/unfavorable, spam/not-spam, etc. with a high success rate. News content too is known to affect human sentiment leading to sharp, short term price movements in stocks that follows a positive/negative news. The attached sample dataset may be used to train a machine learning model to classify news text and predict its influence on stock price, and subsequently to deduce buy/sell recommendations. A predicted downward price movement may also help institutions engaged in lombard lending (securities lending) employ proactive risk mitigation. The dataset contains news articles and the empirical stock price movements following the news publication date. To attribute the stock price move to a specific news incident alone is difficult, as there are several factors influencing the stock price. However, we have selected stocks and incident dates, where the stock has significantly outperformed or underperformed its industry peers. Thus, the effects of broader market and industry factors can be assumed to have less significance, because such factors would cause all industry peers to rise/fall in tandem, if at all any cause-effect relationship exists. In other words, if the company's stock price showed a statistically significant up/downward change relative to its industry peers in the reference time period, only then such data points are taken in consideration. Secondly, earnings related news content (fundamental factor in attractiveness of a stock) is omitted from consideration, to keep the analysis limited in scope to incident news alone. Reference time period for evaluating the under/out performance is kept to a maximum of 10 days, to only capture "short-term" price movements. This helps omit the scenarios where stock price was affected by business operational realities of the company e.g. actual (not reported) success/failure of its product/service, as such events are relatively long term. In short, due care (feature engineering) has been employed to curate this dataset to serve its intended application. Please note that this is only a sample dataset of roughly 100 records. Full dataset can be requested for non commercial use. Please contact me via this platform or via Linkedin.
创建时间:
2020-08-14
二维码
社区交流群
二维码
科研交流群
商业服务