Form 10-K Filings
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/ChicagoHAI/characterizing-multimodal-long-form-summarization
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了1000份HTML格式的10-K表格文件,这些文件是美国证券交易委员会(SEC)要求的公司全面年度报告,详细阐述了公司的财务状况、业务概览及其他必须披露的信息。特别关注的是第7项,“管理层对财务状况及经营成果的分析讨论”(MD&A)。此外,该数据集还包含了报告的原始版本和打乱版本,以探索大型语言模型(LLMs)中的位置偏见问题。该数据集的规模为1000份报告,任务是对这些内容进行摘要总结。
This dataset contains 1,000 HTML-formatted 10-K filings, which are comprehensive annual corporate reports mandated by the U.S. Securities and Exchange Commission (SEC). These documents elaborate on a company's financial status, business overview, and other mandatory disclosure information. Special attention is given to Item 7, "Management's Discussion and Analysis of Financial Condition and Results of Operations (MD&A)". Furthermore, the dataset includes both the original and shuffled versions of the reports, aiming to explore position bias in large language models (LLMs). Comprising 1,000 reports in total, the task of this dataset is to conduct abstract summarization on their contents.
提供机构:
U.S. Securities and Exchange Commission (SEC)



