目录--行业匹配度分析数据集
收藏贵州省数据知识产权登记平台2025-12-23 更新2025-12-24 收录
下载链接:
https://gzdipp.gzsis.cn:12020/noticeDetail?id=2033&type=1
下载链接
链接失效反馈官方服务:
资源简介:
1.数据采集:从企业电子卖场平台提取已归集的企业基础信息,具体包含企业对应目录、行业定位、主营业务;
2.数据处理:1)首先对目录与行业文本进行TF-IDF向量化处理(将文本转化为可计算的数值形式,捕捉文本核心特征),通过算法计算两者间的文本相似度,初步判断分类匹配基础程度;2)随后结合主营业务关键词进行加权校正,根据业务相关性赋予差异化权重,通过加权计算输出0-1区间的匹配度分值,分值越接近1表示匹配度越高,越接近0表示匹配度越低;同时系统自动标记分值调整原因(如文本语义相似度不足、核心业务关键词未匹配、特征权重差异等通用情形),清晰追溯分值计算逻辑;3)最后设定匹配度阈值,对匹配度低于阈值的条目自动生成待复核清单;
3.数据应用:供工商系统内部校准企业分类,确保企业分类的准确性与规范性;为征信机构提供企业分类一致性校验服务,保障征信数据中企业分类的可靠性;也可为行业分析机构开展基于企业分类的研究提供数据质量保障。
1. Data Collection: Extract pre-aggregated basic enterprise information from enterprise electronics mall platforms, specifically including the enterprise's corresponding catalog, industry positioning, and main business operations.
2. Data Processing: 1) First, conduct TF-IDF vectorization on the catalog and industry texts (convert texts into computable numerical formats to capture core text features), calculate the text similarity between the two via algorithms, and preliminarily assess the basic matching level of enterprise classification; 2) Subsequently, perform weighted correction by combining main business operation keywords, assign differentiated weights based on business relevance, and compute a matching score ranging from 0 to 1 through weighted calculation. The closer the score is to 1, the higher the matching degree, and vice versa. Meanwhile, the system will automatically tag the reasons for score adjustment (e.g., insufficient text semantic similarity, unmatched core business keywords, discrepancies in feature weights, and other common scenarios) to clearly trace the logic of score calculation; 3) Finally, set a matching degree threshold, and automatically generate a review queue for entries with a matching score below the threshold.
3. Data Application: Provide internal enterprise classification calibration services for the industrial and commercial regulatory system to ensure the accuracy and standardization of enterprise classification; Offer enterprise classification consistency verification services for credit reporting agencies to safeguard the reliability of enterprise classification data in credit reporting datasets; Also provide data quality assurance support for industry analysis institutions to carry out research based on enterprise classification.
提供机构:
贵州梵云大数据集团有限公司
创建时间:
2025-12-19
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个用于行业匹配度分析的公共数据资源,包含2600条记录,年更新周期。它通过TF-IDF向量化和加权校正算法,计算企业目录与行业之间的匹配度分值,主要应用于工商系统校准企业分类、征信机构数据校验以及行业分析研究,旨在提升企业分类的准确性和规范性。
以上内容由遇见数据集搜集并总结生成



