five

Data for: The textual similarity of news content and stock return synchronicity

收藏
DataCite Commons2025-05-06 更新2025-05-17 收录
下载链接:
https://data.mendeley.com/datasets/ywkpdg58c3
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset accompanies the study “The textual similarity of news content and stock return synchronicity”, which investigates how the homogeneity of news narratives across firms relates to the synchronicity of their stock returns. Our research hypothesis posits that higher textual similarity in firm-specific news leads to greater stock return synchronicity, as more uniform information reduces firm-specific variation in investor beliefs and trading behavior. The data includes firm-level measures of news textual similarity and stock return synchronicity for publicly listed firms, covering the period from 2013 to 2022. Textual similarity is computed using cosine similarity scores derived from TF-IDF representations of firm-specific news articles collected from reputable financial news sources. We preprocess the news content by removing stop words and applying standard tokenization and lemmatization procedures. News articles are grouped by firm and time period, and similarity is measured against a rolling market-wide benchmark. The original news text, stock trade data, and accounting data used in this study are sourced from the China Stock Market and Accounting Research (CSMAR) database, while the textual tone of MD&A is sourced from the Chinese Research Data Services Platform (CNRDS). The news sources include both traditional paper media and internet media. The sample removes records: (i) financial firms, (ii) firms listed for less than one year, and (iii) firms with missing values for control variables. After filtering, our final sample comprises 82,215 observations covering 4,102 firms. Stock return synchronicity is quantified using the R² statistic from a market model regression, following established literature, where a higher R² indicates stronger co-movement with the market and weaker firm-specific return variation. Our data show a robust positive correlation between news similarity and stock return synchronicity, even after controlling for firm fundamentals, media coverage volume, and other confounding factors. This finding suggests that uniform media narratives can reduce the information diversity available to investors, contributing to higher return co-movement. This dataset includes: ASVImonthly.dta base_data.dta BellWether_Newsprop.dta DisAcc.dta isAnnoym.dta NewsNumlarge8ym.dta numAholder_yq.dta ReportSim_ym.dta Rmkt.dta sigma_mkt.dta Stkcd_ym_NewsTone.dta Topic_wordscomovement.dta yearMDATone.dta ymChinaNewsBasedEPU.dta ymCICSI.dta The data can be used to explore information diffusion, media effects in financial markets, and the mechanisms behind co-movement in asset prices. Researchers replicating or extending this work can match the firm identifiers and timestamps with other financial databases such as CSMAR or CNRDS.
提供机构:
Mendeley Data
创建时间:
2025-05-06
二维码
社区交流群
二维码
科研交流群
商业服务