five

Annual Reports Assessment Dataset

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7536331
下载链接
链接失效反馈
官方服务:
资源简介:
Annual reports Assessment Dataset  This dataset will help investors, merchant bankers, credit rating agencies, and the community of equity research analysts explore annual reports in a more automated way, saving them time. Following Sub Dataset(s) are there : a) pdf and corresponding OCR text of 100 Indian annual reports These 100 annual reports are for the 100 largest companies listed on the Bombay Stock Exchange. The total number of words in OCRed text is 12.25 million. b) A Few Examples of Sentences with Corresponding Classes The author defined 16 widely used topics used in the investment community as classes like: Accounting Standards Accounting for Revenue Recognition Corporate Social Responsbility Credit Ratings Diversity Equity and Inclusion Electronic Voting    Environment and Sustainability Hedging Strategy Intellectual Property Infringement Risk Litigation Risk Order Book    Related Party Transaction Remuneration Research and Development Talent Management Whistle Blower Policy These classes should help generate ideas and investment decisions, as well as identify red flags and early warning signs of trouble when everything appears to be proceeding smoothly.  ABOUT DATA :: "scrips.json" is a json with name of companies     "SC_CODE" is BSE Scrip Id      "SC_NAME" is Listed Companies Name     "NET_TURNOV" is Turnover on the day of consideration "source_pdf" is folder containing both PDF and OCR Output from Tesseract     "raw_pdf.zip" contains raw PDF and it can be used to try another OCR.     "ocr.zip" contains json file (annual_report_content.json) containing OCR text for each pdf.     "annual_report_content.json" is an array of 100 elements and each element is having two keys "file_name" and "content" "classif_data_rank_freezed.json" is used for evaluation of results     contains "sentence" and corresponding "class"
创建时间:
2023-01-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作