five

A Dataset on the Effect of the SMEs Promotion Law on Corporate Innovation Boundaries

收藏
DataCite Commons2026-03-26 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=c78de12ace424459a33bc78db3eef740
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is constructed based on publicly available data from companies listed on the SME Board, the Science and Technology Innovation Board (STAR Market), and the ChiNext Board of China’s A-share market from 2013 to 2022, with the aim of supporting empirical research on the impact of the SMEs Promotion Law on the innovation boundaries of enterprises. The data processing procedure primarily involves the following steps: first, financial and corporate governance data are obtained from the CSMAR database, annual report information is collected from the Sina Finance website, patent data are sourced from the China Patent Database of the China National Intellectual Property Administration, and macroeconomic data at the provincial level, along with the Digital Financial Inclusion Index (from the Institute of Digital Finance at Peking University), are matched accordingly. Second, based on the first four digits of the International Patent Classification (IPC) codes and using a five-year rolling window approach, patents applied by each firm are classified into new technology domain patents and existing technology domain patents. The innovation boundary of a firm is measured as the natural logarithm of one plus the number of new technology domain patent applications. The core explanatory variable, the implementation of the SMEs Promotion Law, is defined as a dummy variable taking the value of 1 for the years 2018 and onward, and 0 otherwise. On this basis, firms categorized as ST or *ST, those in the financial, insurance, and real estate industries, firms in the scientific research and technical services sector, and those with severe data deficiencies are excluded, resulting in an unbalanced panel dataset. All continuous variables are winsorized at the 1st and 99th percentiles prior to regression analysis to mitigate the influence of outliers.The dataset spans a time frame from 2013 to 2022, covering ten annual observations, and geographically encompasses A-share listed SMEs across various provinces in mainland China. It comprises 11,838 observations and 142 variables, structured in a panel data format, where each row represents an observation of a firm in a specific year. Row labels are uniquely identified by the combination of the firm identifier (stock code) and year. Column labels cover the following major categories: identification variables (stock code, stock abbreviation, year, industry code); policy variables (binary indicator for the SMEs Promotion Law, SME development plan, government procurement); Difference-in-Differences (DID) model variables (multiple sets of treatment group indicators, time dummies, and interaction terms corresponding to various policy dimensions such as intellectual property, technology-based finance, court establishment, and public services); innovation variables (innovation boundary, exploitative innovation, ambidextrous innovation balance, artificial intelligence level, AI patents, technological diversification); financial and governance variables (profitability, growth, firm size, ownership concentration, nature of property rights, board size, proportion of independent directors); and macro-environmental variables (marketization index, digital economy level, regional innovation index, digital finance index). In terms of units of measurement, monetary indicators (e.g., operating revenue) are denominated in Renminbi (RMB), ratio-type indicators are presented as percentages or decimals, and counts such as the number of patents and number of employees are expressed as count units. Due to the timing of listings and incomplete data disclosure in certain years, some variables have missing observations. This issue has been addressed by excluding observations with severe missingness and by employing panel fixed-effects models, rather than imputing missing values. As the data are primarily derived from publicly disclosed corporate annual reports, patent databases, and reputable commercial databases, the overall data quality is reliable; however, minor measurement errors may exist due to factors such as the original data collection process and changes in industry classification standards.
提供机构:
Science Data Bank
创建时间:
2026-03-26
二维码
社区交流群
二维码
科研交流群
商业服务