Generative AI rewritten SEC filings
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/xxpjm7pxbg
下载链接
链接失效反馈官方服务:
资源简介:
The rewritten filings in this dataset were generated using large language models from OpenAI, specifically GPT-4o and GPT-4o-mini. The rewriting process focused on the Management Discussion and Analysis (MD&A) part (section 7 and section 2, respectively) of 10-K and 10-Q filings, with the goal of maintaining the original content while improving the sentiment. There are 11,266 10-K and 32,620 10-Q filings. The sample was selected ensuring neutrality by considering both the year and sector. This method ensures a balanced representation across different time periods and industries, avoiding biases related to specific sectors or years in the rewritten filings.
The rewritten filings in this dataset are saved in text files named according to the format: cik + '_' + accession number of the filing + '_section' + section number + '_' + model + '.txt'
The following query was used:
Please rewrite the provided MD&A section of a 10-K filing. Your goal is to create a new version that maintains the original meaning, key details, and financial information, but with more positive wording and phrasing. Ensure the rewritten text is coherent, professionally written, and retains the appropriate tone for a financial report. Also, enhance the positive sentiment by highlighting achievements, growth, and opportunities, while preserving all factual content. \n\nOriginal Text:
创建时间:
2024-10-22



