five

全国招投标数据商业挖掘分析数据模型

收藏
贵州省数据知识产权登记平台2025-01-08 更新2025-01-09 收录
下载链接:
https://gzdipp.gzsis.cn:12020/noticeDetail?id=308&type=1
下载链接
链接失效反馈
官方服务:
资源简介:
全国招投标数据商业挖掘分析数据模型由以下三类模型组合构成: 1、招标过程分类:利用TextCNN为base model,结合分词技术、招投标领域教据和应用场景生成垂直领域的数据分类模型; 2、命名实体识别:基于Bert的base模型,增加相对位置、词性等信息在招投标领域的数据上进行微调,并通过人工打标签的方式生成训练集并训练出实体抽取模型; 3、Ocr文字识别:通过人工标注图片数据,并进行paddle-0cr微调生成特定领域的OCR文字识别模型。 通过以上三类模型组合,可以识别多种招标文书,对文书内容进行识别,将多种结构的数据,构成结构化和半结构化组合的全国招投标数据大数据集合,支持网站、APP、小程序实时分析、访问和获取这些数据。

The commercial mining and analysis data model for national bidding and tendering data is composed of the following three types of models: 1. Bidding Process Classification: Taking TextCNN as the base model, combined with word segmentation technology, bidding and tendering domain data and application scenarios, a vertical domain data classification model is generated; 2. Named Entity Recognition (NER): Based on the pre-trained Bert base model, relative position, part-of-speech and other information are added for fine-tuning on bidding and tendering domain data. A training set is generated through manual annotation, and an entity extraction model is trained accordingly; 3. OCR Text Recognition: Image data is manually annotated, and fine-tuning is performed using Paddle-OCR to generate a domain-specific OCR text recognition model. By combining the above three types of models, various bidding documents can be identified, and their content can be recognized. Data of multiple structures are integrated into a large-scale national bidding and tendering data set combining structured and semi-structured formats, which supports real-time analysis, access and data acquisition via websites, APPs and mini-programs.
提供机构:
贵阳高新数通信息有限公司
创建时间:
2025-01-07
搜集汇总
数据集介绍
main_image_url
特点
该数据集是一个全国招投标数据的商业挖掘分析模型,数据规模为37G,每日更新。它通过三类模型组合(招标过程分类、命名实体识别和OCR文字识别)对招投标文书进行识别和分析,适用于企业采购需求分析、竞争对手评估和商机洞察等多种商业场景。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务