five

Brain Language Metrics on Company Filings - Live Feed

收藏
Snowflake2022-07-12 更新2024-05-01 收录
下载链接:
https://app.snowflake.com/marketplace/listing/GZSVZD5BWX
下载链接
链接失效反馈
官方服务:
资源简介:
The Brain Language Metrics on Company Filings (BLMCF) dataset has the objective of monitoring several language metrics on 10-Ks and 10-Qs company reports for approximately 6000+ US stocks. Recent literature works claim inefficiencies in the market response to company filings information due to the increased complexity and length of such reports. See for example - “Lazy Prices” Cohen et al. 2018 - “ The Positive Similarity of Company Filings and the Cross-Section of Stock Returns”, M. Padysak 2020 - How to Use Lexical Density of Company Filings, D. Hanicova et al., 2021 This data set contains historical data from January 2010 and live data updated daily within 12pm UTC. DATASET STRUCTURE AND KEY FIELDS The dataset is constituted of a single schema "LANGUAGE_METRICS_COMPANY_FILINGS" and it can be logically divided in two parts. The first part includes the language metrics of the most recent 10-K or 10-Q report for each firm and it is saved in the tables "METRICS_10K" (metrics for 10-K reports) and "METRICS_ALL" (metrics for 10-Ks and 10-Q reports). The key metrics are: 1. Financial sentiment (field SENTIMENT) 2. Percentage of words belonging to financial domain classified by language types: - “Constraining” language (field SCORE_CONSTRAINING) - “Interesting” language (field SCORE_INTERESTING) - “Litigious” language (field SCORE_LITIGIOUS) - “Uncertainty” language (field SCORE_UNCERTAINTY) 3. Readability score (field READABILITY) 4. Lexical metrics such as lexical density and richness (fields LEXICAL_RICHNESS and LEXICAL_DENSITY) 5. Text statistics such as the report length and the average sentence length (fields N_SENTENCES and MEAN_SENTENCE_LENGTH) The second part includes the differences between the two most recent 10-Ks or 10-Qs reports of the same period for each company and it is saved in the tables "DIFFERENCES_10K" (differences of metrics for 10-K reports) and "DIFFERENCES_ALL" (differences metrics for 10-Ks and 10-Q reports). The key metrics are: 1. Difference of the various language metrics (e.g. delta sentiment, delta readability, delta percentage of a specific language type etc.). See for example the field DELTA_SENTIMENT that represents the difference of financial sentiment between the last available report and the previous report of same period and category. 2. Similarity metrics between documents, also with respect to a specific language type (for example similarity with respect to “litigious” language or “uncertainty” language). See for example the field SIMILARITY_ALL that represents the language similarity between the last available report and the previous report of same period and category. The dataset includes the metrics and related differences both for the whole report and for specific sections (Risk Factors and Management Discussion and Analysis). FACTSHEET Link to factsheet: https://braincompany.co/assets/files/BLM_CF_V2_summary.pdf DISCLAIMER The content of this dataset is not to be intended as investment advice. The material is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory or other services by Brain. Brain makes no guarantees regarding the accuracy and completeness of the information expressed in the dataset.
提供机构:
Brain
创建时间:
2022-07-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作