five

Physics-Informed Phenomenology of Safety Invariance in Stochastic Generative AI: Evidence from LLM Moderation Pipelines — Public Companion Deposit

收藏
Zenodo2026-05-19 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20278520
下载链接
链接失效反馈
官方服务:
资源简介:
Public companion deposit for a peer-reviewed journal submission extending the physics-informed phenomenology of safety invariance from physical control systems to a stochastic AI moderation pipeline. The deposit contains the aggregate validation data, derived analysis tables, statistical bounds, figure-generation scripts, and publication-ready figures used in the manuscript. Headline results. A safety filter wrapping the OpenAI Moderation API (omni-moderation-latest) is exercised on N = 2,735 prompts stratified across four semantic risk categories: Benign (725), Edge Case (725), Adversarial (725), and Crisis (560), at batch configurations κ ∈ [0.1, 2.0] over six experimental batches. Across all 2,735 samples we observe zero post-projection violations of the policy thresholds, with a one-sided Clopper–Pearson 95% upper bound on the per-sample failure probability of 1.35 × 10⁻³. Boundary contact is strongly category-dependent (Benign 0%, Edge Case 0%, Adversarial 1.93%, Crisis 43.21%) and the Crisis category exhibits a boundary accumulation ratio ρ = 8.64 at ε = 5%R relative to a uniform baseline, consistent with a reflecting-boundary regime in the semantic state space. What is included. Aggregate scores (per-sample numeric records with hashed prompt identifiers and 8-dim moderation score vectors), derived analysis tables (boundary mass fractions, accumulation ratios), statistical bounds (Clopper–Pearson upper bound, intervention rate), figure-generation scripts, and the publication-ready PNG/PDF figures of the manuscript and supplementary information. What is not included. The deposit follows a three-tier data availability policy. (1) Open here under CC BY 4.0: aggregate data, scripts, figures. (2) Access-controlled, request-based: the raw prompt–response set, which contains adversarial and crisis-related content and will be released through a researcher-attestation protocol. (3) Not released: the internal implementation of the safety layer, which is the subject of pending US patent application USPTO 19/535,932 (international rights under the Paris Convention and PCT reserved).
提供机构:
Zenodo
创建时间:
2026-05-19
二维码
社区交流群
二维码
科研交流群
商业服务