five

AI Disclosure and Corporate Misconduct Panel (U.S., 2020–2024)

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/nf88fc7f24
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains a firm-year panel of large U.S. publicly listed companies observed over the period 2020–2024. The panel includes approximately 50–60 firms (balanced structure where available), yielding roughly 250–300 firm-year observations. The sample focuses on large, technology-intensive and non-financial corporations for which artificial intelligence (AI) disclosure became strategically salient during this period. The primary purpose of the dataset is to examine the relationship between AI disclosure intensity and corporate misconduct, as well as the moderating role of executive power and governance oversight. AI disclosure intensity is measured using a deterministic dictionary-based count of AI-related terms extracted from annual Form 10-K filings. The counting procedure applies exact, case-insensitive string matching rules to a pre-specified list of AI-related stems (e.g., “artificial intelligen,” “machine learn,” “deep learn,” “neural network,” “algorithm,” “automation,” “predict,” “analytics,” “natural language,” “computer vision,” “autonomous”). The procedure does not involve semantic inference, classification, or machine learning. It produces an annual firm-level count of AI-related mentions, which serves as a proxy for AI salience in corporate disclosure. Corporate misconduct is measured as the annual count of regulatory enforcement actions associated with each firm, aggregated at the firm-year level and transformed as ln(1 + count) in empirical analyses. Governance oversight is proxied using a count of governance-risk disclosure phrases in 10-K filings (e.g., “material weakness,” “restatement,” “SEC investigation,” “internal control deficiency,” “compliance failure”). Executive power is measured using CEO duality (indicator equal to 1 if the CEO also serves as board chair). The dataset also includes financial control variables such as total assets, profitability (e.g., net income or ROA), leverage, and revenue, as well as sector classifications. Firm and year identifiers are included to facilitate panel estimation with fixed effects. All text-based variables are generated using standardized extraction prompts applied uniformly across firms and years, ensuring full transparency and replicability. The dataset supports replication of analyses examining nonlinear (quadratic) relationships between AI disclosure and misconduct, as well as moderated quadratic models incorporating executive power and governance oversight.
创建时间:
2026-02-19
二维码
社区交流群
二维码
科研交流群
商业服务