ChatGLM2-6B
收藏OpenXLab2026-04-18 收录
下载链接:
https://openxlab.org.cn/datasets/OpenDataLab/ChatGLM2-6B
下载链接
链接失效反馈官方服务:
资源简介:
We have uploaded some annual report data sets of listed companies from 2019 to 2021 in the ModelScope community. The data set contains 11,588 detailed PDF files. You can use the contents of these PDF files to build the database or vector library you need.
Here are our recommended steps:
1. PDF text and table extraction: You can use toolkits such as pdfplumber and pdfminer to extract text and table data from PDF files.
2. Data segmentation: According to the directory, subdirectory and chapter information of the PDF file, the content is accurately segmented.
3. Build a basic financial database: Design professional financial database fields and formats based on financial knowledge and PDF content. For example, define the balance sheet, cash flow statement, income statement, etc.
4. Information extraction: Use the information extraction capabilities of large models and NLP technology to extract corresponding financial field information. For example, please use json mode to output the contents of the directory, with the chapter name as the key and the page number as the value. At the same time, please extract the data in the table in detail and output it in JSON format.
5. Construct a financial knowledge question and answer database: Combined with the constructed financial database, apply large models to build a basic financial question and answer database.
提供机构:
OpenDataLab
创建时间:
2024-05-14



