境内金融科技产品及机构的态势感知与分类识别算法程序及模型文件数据集
收藏国家基础学科公共科学数据中心2025-11-29 收录
下载链接:
https://nbsdc.cn/general/dataDetail?id=6925d29e195d26651c431a3e&type=1
下载链接
链接失效反馈官方服务:
资源简介:
本数据集为基于深度学习框架构建的金融科技产品与机构分类识别模型数据集,主要用于态势感知与风险识别的算法研究与模型验证。模型训练数据来源于境内主要金融科技企业的业务信息、产品描述与技术特征文本,时间范围覆盖2023年5月至2025年3月,空间范围包括全国重点区域的金融科技企业,空间精度控制在省级至地市级。模型采用Python语言开发,基于PyTorch深度学习框架构建,通过GPU并行计算完成多阶段训练与验证。数据集包括核心模型文件financial_classifier.pth及配套的institution、product、tokenizer三个子目录,分别用于机构分类、产品分类及文本特征编码。模型训练过程中引入交叉验证与早停机制,并结合多源样本的清洗、归一化和语义向量化处理以提升模型鲁棒性和泛化能力。质量控制方面,对训练过程进行日志记录与人工复核,确保模型权重可追溯、结果稳定、精度可靠。该模型数据集容量约1.57GB,具备高度结构化和可复用特性,可直接应用于金融科技智能识别、风险预警及监管决策支持等研究场景。其潜在价值在于为我国金融科技风险识别体系建设和智能化监管模型优化提供了可验证、可扩展的技术基础。
This dataset is a classification and recognition model dataset for fintech products and institutions built on deep learning frameworks, primarily used for algorithm research and model validation in situation awareness and risk identification. The model training data is sourced from business information, product descriptions, and technical feature texts of major domestic fintech enterprises, with a time span covering May 2023 to March 2025, and a spatial scope including fintech enterprises in key national regions, with spatial precision controlled at the provincial to prefecture-level city level. The model is developed in Python and built on the PyTorch deep learning framework, completing multi-stage training and validation through GPU parallel computing. The dataset includes the core model file financial_classifier.pth and three supporting subdirectories: institution, product, and tokenizer, which are respectively used for institution classification, product classification, and text feature encoding. Cross-validation and early stopping mechanisms are introduced during the model training process, combined with cleaning, normalization, and semantic vectorization processing of multi-source samples to enhance the model's robustness and generalization capability. In terms of quality control, training process logs are recorded and manually reviewed to ensure that the model weights are traceable, results are stable, and accuracy is reliable. This model dataset has a capacity of approximately 1.57 GB, featuring highly structured and reusable properties, and can be directly applied to research scenarios such as fintech intelligent recognition, risk early warning, and regulatory decision support. Its potential value lies in providing a verifiable and scalable technical foundation for the construction of China's fintech risk identification system and the optimization of intelligent regulatory models.
提供机构:
湖南工商大学



