five

Features mined from registration statements for the listing prediction on the STAR market

收藏
Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/wkn9ys2zdy/1
下载链接
链接失效反馈
官方服务:
资源简介:
To study whether the features of registration statements for the Science and Technology Innovation Board (STAR Market) can predict the outcome of listing reviews, we develop IPOhelper based on statistical (financial, technological innovation indicators) and semantic cues (textual indicators) in registration statements. We adopted a variety of advanced machine learning models, including LR, SVM, KNN, NB, RF, GBRT, XGBoost, and AdaBoost within our developed IPOhelper predictive system. It is a novel predictive system for initial public offering (IPO) prediction and AdaBoost performs exceptionally well in predicting IPO outcomes. Compared with statistical cues, the predictive abilities of semantic features are particularly prominent. From the official website of the Shanghai Stock Exchange, we collected 692 registration statements of companies that have applied for listing and have achieved results from the STAR Market from June 2019 to 2023. Among the 692 registration statements, 533 registration statements were for successful listed companies and 159 registration statements were for unsuccessful listed companies. Then, based on the collected registration statements, we used the Python crawler method to extract the relevant data of financial, science and technology innovation, and textual disclosure features. Finally, we obtained 18 indicators, including 6 financial indicators, 5 scientific and technological innovation indicators, and 7 semantic indicators from each of the 692 registration statements.

为探究科创板(Science and Technology Innovation Board, STAR Market)注册申请文件的特征能否预测上市审核结果,本研究基于注册申请文件中的统计类指标(财务、科技创新指标)与语义线索(文本指标),开发了IPOhelper工具。本研究所开发的IPOhelper预测系统集成了多种先进机器学习模型,包括LR(逻辑回归,Logistic Regression)、SVM(支持向量机,Support Vector Machine)、KNN(K近邻,K-Nearest Neighbor)、NB(朴素贝叶斯,Naive Bayes)、RF(随机森林,Random Forest)、GBRT(梯度提升回归树,Gradient Boosting Regression Tree)、XGBoost(极限梯度提升树,eXtreme Gradient Boosting)及AdaBoost(自适应提升算法,Adaptive Boosting)。该系统是一款全新的首次公开募股(Initial Public Offering, IPO)预测工具,其中AdaBoost在IPO审核结果预测任务中表现尤为优异。相较于统计类线索,语义特征的预测能力表现尤为突出。本研究从上海证券交易所官方网站采集了2019年6月至2023年间,所有科创板上市申请且已完成审核的企业的692份注册申请文件。在这692份注册申请文件中,533份对应成功上市的企业,159份对应审核未通过的企业。随后,基于采集到的注册申请文件,本研究采用Python爬虫技术提取了财务、科技创新及文本披露三类特征的相关数据。最终,从692份注册申请文件中,我们共提取得到18项特征指标,其中包括6项财务指标、5项科技创新指标与7项语义指标。
提供机构:
Central University of Finance and Economics; Renmin University of China
二维码
社区交流群
二维码
科研交流群
商业服务