ClarusC64/clinical-quad-infection-buffer-lag-coupling-sepsis-transition-v1.3

Name: ClarusC64/clinical-quad-infection-buffer-lag-coupling-sepsis-transition-v1.3
Creator: ClarusC64
Published: 2026-03-23 19:30:27
License: 暂无描述

Hugging Face2026-03-23 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/ClarusC64/clinical-quad-infection-buffer-lag-coupling-sepsis-transition-v1.3

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: en license: mit task_categories: - text-classification tags: - clinical-trials - quad-coupling - failure-reconstruction - clarus - clarus-v1.3 - sepsis-transition size_categories: - 1K<n<10K pretty_name: Clinical Quad Infection Buffer Lag Coupling Sepsis Transition v1.3 --- # Clinical Quad Infection Buffer Lag Coupling Sepsis Transition v1.3 ## Benchmark definition Benchmark family: Clarus Benchmark layer: v1.3 Geometry type: Failure Reconstruction Geometry Domain: Clinical stability systems Structure: Quad coupling instability model Primary question: Can a model reconstruct the causal pathway that produced a failure state? Evaluation requires identifying: - the ordered failure decision chain - the root policy error - the counterfactual recovery step ## Clarus benchmark suite This dataset is part of the Clarus benchmark suite. Clarus benchmarks evaluate whether machine learning systems can reason about the stability of complex coupled systems. Most machine learning benchmarks measure prediction accuracy. Clarus benchmarks evaluate whether models can understand: - system state - instability trajectories - intervention effects - causal failure pathways Each dataset represents one geometric layer of system stability reasoning. Together the ladder forms a structured evaluation of stability reasoning. ## Research context The Clarus benchmark program investigates how machine learning systems reason about stability in complex coupled systems. Many real-world failures occur not because a system cannot detect events, but because it cannot correctly reconstruct how a cascade unfolded. Examples include: - clinical deterioration - infrastructure collapse - financial contagion - distributed system failures In these environments, useful intelligence requires understanding: - the structure of the system - how instability propagates - where the first reversible error occurred - which intervention could have prevented collapse The Clarus benchmark ladder evaluates these capabilities progressively. Each layer introduces a new level of structural reasoning about system dynamics. ## What this repo does This repository contains a Clarus **v1.3 benchmark dataset**. The v1.3 layer introduces **Failure Reconstruction Geometry**. Earlier Clarus layers evaluate: - system state - trajectory dynamics - intervention selection - control sequence correctness - temporal policy stability v1.3 evaluates whether a model can reconstruct the **causal path that produced a failure state**. The benchmark asks: - what policy error initiated the cascade - how the failure propagated - which intervention would have prevented collapse This benchmark evaluates **causal failure understanding**, not just failure detection. ## Core quad The system state is represented using a four-variable coupling model. - infection_load - buffer_capacity - lag_burden - coupling_stress ## Clinical variable mapping | Quad Variable | Clinical Measurements | Typical Indicators | |---|---|---| | infection_load | infectious burden, pathogen load | bacteremia, uncontrolled infection source | | buffer_capacity | physiological reserve | immune response capacity, perfusion reserve | | lag_burden | delayed intervention | delayed antibiotics, delayed source control | | coupling_stress | systemic cross-organ stress | inflammatory spillover, organ interaction stress | ## Failure reconstruction geometry Failure reconstruction models the sequence of policy and physiological events that lead to collapse. Example chain: delayed_antibiotics > infection_escalation > septic_instability > organ_failure Order matters. Reconstruction scoring evaluates how closely a predicted chain matches the ground truth chain. Ordered chain overlap is computed using the longest common subsequence. Scoring principle: overlap = LCS(predicted_chain, true_chain) / length(true_chain) ## Signals The dataset includes the following reconstruction signals: - failure_decision_sequence - failure_path_length - cascade_amplification_factor - recovery_window_width Labels: - label_root_policy_error - label_counterfactual_recovery_step - label_failure_reconstruction ## Example row | Signal | Example Value | |---|---| | infection_load | 0.84 | | buffer_capacity | 0.30 | | lag_burden | 0.66 | | coupling_stress | 0.72 | | failure_decision_sequence | delayed_antibiotics > infection_escalation > septic_instability | | failure_path_length | 3 | | cascade_amplification_factor | 0.78 | | recovery_window_width | 0.21 | | label_root_policy_error | delayed_antibiotics | | label_counterfactual_recovery_step | early_antibiotics | | label_failure_reconstruction | 1 | ## Files The repository contains: - data/train.csv - data/tester.csv - scorer.py - benchmark_spec.json - dataset_schema.json - README.md ## Evaluation Primary metric: failure_reconstruction_accuracy Secondary metric: false_failure_reconstruction_rate Binary metrics: - accuracy - precision - recall - f1 Diagnostics: - high_amplification_chain_miss_rate - narrow_recovery_window_miss_rate - root_policy_error_miss_rate ## Benchmark protocol Evaluation follows the standard Clarus benchmark protocol. 1. Train a model using `data/train.csv`. 2. Generate predictions for the tester dataset. 3. The submission file must contain the required prediction columns. 4. The scorer evaluates predictions using `scorer.py`. 5. The scorer produces a structured JSON evaluation report containing: - binary classification metrics - reconstruction accuracy metrics - diagnostic failure metrics Primary evaluation metric: failure_reconstruction_accuracy Secondary evaluation metric: false_failure_reconstruction_rate ## Intended use This dataset is intended for: - causal failure reconstruction benchmarking - policy accountability analysis - post-collapse reasoning evaluation - evaluation of structured reasoning in machine learning systems ## Limitations This dataset evaluates structural causal reconstruction rather than full clinical fidelity. Failure chains are benchmark artifacts representing modeled collapse dynamics. They should not be interpreted as literal clinical treatment protocols. This dataset is **not intended for direct clinical decision making**. ## Position in Clarus ladder Clarus benchmark layers: - v0.1 cascade state detection - v0.2 trajectory detection - v0.3 dynamic cascade forecasting - v0.4 boundary discovery - v0.5 recovery geometry - v0.6 intervention pathway reasoning - v0.7 uncertainty geometry - v0.8 regime transition geometry - v0.9 intervention competition geometry - v1.0 closed-loop control geometry - v1.1 adaptive policy stability - v1.2 temporal memory geometry - **v1.3 failure reconstruction geometry** ## License MIT

提供机构：

ClarusC64

5,000+

优质数据集

54 个

任务类型

进入经典数据集