ClarusC64/cfir-stability-intervention-geometry-v0.1

Name: ClarusC64/cfir-stability-intervention-geometry-v0.1
Creator: ClarusC64
Published: 2026-03-27 16:31:13
License: 暂无描述

Hugging Face2026-03-27 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/ClarusC64/cfir-stability-intervention-geometry-v0.1

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: mit task_categories: - text-classification tags: - clarus - stability-geometry - intervention-reasoning - adversarial-benchmark - quad-coupling size_categories: - 1K<n<10K pretty_name: CFIR v0.1 — Coupled Failure Intervention Reasoning Benchmark --- CFIR v0.1 — Coupled Failure Intervention Reasoning Benchmark What this repo does CFIR v0.1 is a synthetic benchmark designed to evaluate whether models can reason about stability and intervention geometry in coupled systems. Many real-world failures occur not because systems lack information, but because they fail to interpret interacting pressures, buffers, delays, and couplings correctly. This dataset tests whether a model can determine when an intervention will stabilize or fail to stabilize a system state. The benchmark is intentionally constructed so that no single observable variable strongly correlates with the outcome. Correct predictions require reasoning over interactions between variables. Core quad The dataset models system stability using four interacting signals. pressure External or internal stress acting on the system. buffer_capacity Available resilience or reserve capacity. intervention_lag Delay between instability onset and corrective action. system_coupling Degree to which subsystem disturbances propagate through the system. These variables form a stability state vector. Prediction target Models must predict label_intervention_stabilizing Meaning 1 → the proposed intervention stabilizes the system 0 → the proposed intervention fails to stabilize the system The label is computed from latent regime interactions rather than a single equation. This prevents models from exploiting simple feature correlations. Dataset design principles The generator uses several mechanisms to enforce reasoning difficulty. Latent regime mixing Multiple hidden stability regimes determine the outcome. The regime identity is not visible in the dataset. The same surface state can therefore produce different outcomes depending on the hidden regime. Nonlinear interaction geometry Outcomes depend on interaction surfaces such as pressure × coupling pressure × buffer buffer × coupling pressure × lag This forces models to learn joint geometry rather than single-variable rules. Polarity inversion Variables sometimes push toward collapse and sometimes toward stabilization. Example high buffer can stabilize high buffer can also collapse under coupling cascade This removes global directional signals. Feature correlation suppression The generator is designed so that individual feature correlations with the label remain close to zero. Typical values variable correlation pressure ~0.08 buffer_capacity ~0.03 intervention_lag ~0.20 system_coupling ~0.13 This ensures the task cannot be solved through simple heuristics. Row structure Each row represents a system state and a proposed intervention. Fields scenario_id Unique scenario identifier pressure System pressure level buffer_capacity Available resilience capacity intervention_lag Delay between instability detection and response system_coupling Strength of cross-subsystem propagation proposed_intervention Suggested stabilization action label_intervention_stabilizing Ground truth outcome (hidden in tester set) Files data/train.csv Training dataset with labels. data/tester.csv Evaluation dataset without labels. data/tester_key.csv Hidden answer key used by the scorer. generator.py Synthetic dataset generator. prediction_baseline.py Example baseline predictor. scorer.py Evaluation script. Evaluation The scorer reports standard classification metrics. accuracy precision recall f1 Two additional diagnostics are included. recall_stabilizing_interventions Ability to detect interventions that actually stabilize the system. false_effective_intervention_rate Frequency of predicting stabilization when collapse still occurs. These metrics emphasize correct stabilization reasoning rather than generic classification performance. Baseline behavior The provided baseline predictor intentionally performs near chance. Typical scores accuracy ≈ 0.45–0.55 This reflects the adversarial design of the dataset. Models must learn nonlinear interactions to improve performance. Intended research use CFIR v0.1 can be used to study stability reasoning intervention decision modeling interaction learning in structured state spaces robustness to regime shifts The included generator allows researchers to produce larger datasets or probe additional failure patterns. Structural note CFIR is part of a broader effort to benchmark state-space intelligence. Most modern AI benchmarks evaluate content interpretation. CFIR instead evaluates whether models can reason about system stability geometry. Instability is often detectable before collapse occurs. The challenge is recognizing when intervention will succeed. License

提供机构：

ClarusC64

5,000+

优质数据集

54 个

任务类型

进入经典数据集