Annual Reports Assessment Dataset

NIAID Data Ecosystem2026-03-14 收录

下载链接：

https://zenodo.org/record/7536331

下载链接

链接失效反馈

官方服务：

资源简介：

Annual reports Assessment Dataset This dataset will help investors, merchant bankers, credit rating agencies, and the community of equity research analysts explore annual reports in a more automated way, saving them time. Following Sub Dataset(s) are there : a) pdf and corresponding OCR text of 100 Indian annual reports These 100 annual reports are for the 100 largest companies listed on the Bombay Stock Exchange. The total number of words in OCRed text is 12.25 million. b) A Few Examples of Sentences with Corresponding Classes The author defined 16 widely used topics used in the investment community as classes like: Accounting Standards Accounting for Revenue Recognition Corporate Social Responsbility Credit Ratings Diversity Equity and Inclusion Electronic Voting Environment and Sustainability Hedging Strategy Intellectual Property Infringement Risk Litigation Risk Order Book Related Party Transaction Remuneration Research and Development Talent Management Whistle Blower Policy These classes should help generate ideas and investment decisions, as well as identify red flags and early warning signs of trouble when everything appears to be proceeding smoothly. ABOUT DATA :: "scrips.json" is a json with name of companies "SC_CODE" is BSE Scrip Id "SC_NAME" is Listed Companies Name "NET_TURNOV" is Turnover on the day of consideration "source_pdf" is folder containing both PDF and OCR Output from Tesseract "raw_pdf.zip" contains raw PDF and it can be used to try another OCR. "ocr.zip" contains json file (annual_report_content.json) containing OCR text for each pdf. "annual_report_content.json" is an array of 100 elements and each element is having two keys "file_name" and "content" "classif_data_rank_freezed.json" is used for evaluation of results contains "sentence" and corresponding "class"

创建时间：

2023-01-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集