财务声明检测数据集
收藏arXiv2024-02-19 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2402.11728v1
下载链接
链接失效反馈官方服务:
资源简介:
财务声明检测数据集是由乔治亚理工学院、达亚底斯特理工学院和印度理工学院卡拉格普尔分校的研究人员共同创建的,专注于财务领域的声明检测任务。该数据集包含超过230万条财务相关的数字句子,这些句子来自美国大量上市公司的季度分析师报告和盈利电话会议记录。数据集的创建过程涉及使用正则表达式规则分割文本、提取包含数字和财务术语的句子,并通过财务词典进一步筛选以确保数据的财务相关性。该数据集旨在支持开发和评估用于识别财务报告中预测性声明的模型,特别是在预测市场反应和分析师乐观度方面具有重要应用。
The Financial Statement Detection Dataset was co-developed by researchers from the Georgia Institute of Technology, Dayalbagh Institute of Technology, and Indian Institute of Technology Kharagpur, focusing on the financial statement detection task. This dataset comprises over 2.3 million financial-related numerical sentences sourced from quarterly analyst reports and earnings conference call transcripts of a large number of publicly listed companies in the United States. The dataset's creation process involved splitting texts using regular expression rules, extracting sentences containing numbers and financial terminology, and further filtering via financial lexicons to ensure the financial relevance of the data. This dataset is designed to support the development and evaluation of models for identifying predictive statements in financial reports, and has important applications particularly in predicting market reactions and analyst optimism.
提供机构:
乔治亚理工学院, 达亚底斯特理工学院, 印度理工学院卡拉格普尔分校
创建时间:
2024-02-19



