ClarusC64/clinical-quad-infection-buffer-lag-coupling-sepsis-transition-v1.3
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ClarusC64/clinical-quad-infection-buffer-lag-coupling-sepsis-transition-v1.3
下载链接
链接失效反馈官方服务:
资源简介:
---
language: en
license: mit
task_categories:
- text-classification
tags:
- clinical-trials
- quad-coupling
- failure-reconstruction
- clarus
- clarus-v1.3
- sepsis-transition
size_categories:
- 1K<n<10K
pretty_name: Clinical Quad Infection Buffer Lag Coupling Sepsis Transition v1.3
---
# Clinical Quad Infection Buffer Lag Coupling Sepsis Transition v1.3
## Benchmark definition
Benchmark family: Clarus
Benchmark layer: v1.3
Geometry type: Failure Reconstruction Geometry
Domain: Clinical stability systems
Structure: Quad coupling instability model
Primary question:
Can a model reconstruct the causal pathway that produced a failure state?
Evaluation requires identifying:
- the ordered failure decision chain
- the root policy error
- the counterfactual recovery step
## Clarus benchmark suite
This dataset is part of the Clarus benchmark suite.
Clarus benchmarks evaluate whether machine learning systems can reason about the stability of complex coupled systems.
Most machine learning benchmarks measure prediction accuracy.
Clarus benchmarks evaluate whether models can understand:
- system state
- instability trajectories
- intervention effects
- causal failure pathways
Each dataset represents one geometric layer of system stability reasoning.
Together the ladder forms a structured evaluation of stability reasoning.
## Research context
The Clarus benchmark program investigates how machine learning systems reason about stability in complex coupled systems.
Many real-world failures occur not because a system cannot detect events, but because it cannot correctly reconstruct how a cascade unfolded.
Examples include:
- clinical deterioration
- infrastructure collapse
- financial contagion
- distributed system failures
In these environments, useful intelligence requires understanding:
- the structure of the system
- how instability propagates
- where the first reversible error occurred
- which intervention could have prevented collapse
The Clarus benchmark ladder evaluates these capabilities progressively.
Each layer introduces a new level of structural reasoning about system dynamics.
## What this repo does
This repository contains a Clarus **v1.3 benchmark dataset**.
The v1.3 layer introduces **Failure Reconstruction Geometry**.
Earlier Clarus layers evaluate:
- system state
- trajectory dynamics
- intervention selection
- control sequence correctness
- temporal policy stability
v1.3 evaluates whether a model can reconstruct the **causal path that produced a failure state**.
The benchmark asks:
- what policy error initiated the cascade
- how the failure propagated
- which intervention would have prevented collapse
This benchmark evaluates **causal failure understanding**, not just failure detection.
## Core quad
The system state is represented using a four-variable coupling model.
- infection_load
- buffer_capacity
- lag_burden
- coupling_stress
## Clinical variable mapping
| Quad Variable | Clinical Measurements | Typical Indicators |
|---|---|---|
| infection_load | infectious burden, pathogen load | bacteremia, uncontrolled infection source |
| buffer_capacity | physiological reserve | immune response capacity, perfusion reserve |
| lag_burden | delayed intervention | delayed antibiotics, delayed source control |
| coupling_stress | systemic cross-organ stress | inflammatory spillover, organ interaction stress |
## Failure reconstruction geometry
Failure reconstruction models the sequence of policy and physiological events that lead to collapse.
Example chain:
delayed_antibiotics > infection_escalation > septic_instability > organ_failure
Order matters.
Reconstruction scoring evaluates how closely a predicted chain matches the ground truth chain.
Ordered chain overlap is computed using the longest common subsequence.
Scoring principle:
overlap = LCS(predicted_chain, true_chain) / length(true_chain)
## Signals
The dataset includes the following reconstruction signals:
- failure_decision_sequence
- failure_path_length
- cascade_amplification_factor
- recovery_window_width
Labels:
- label_root_policy_error
- label_counterfactual_recovery_step
- label_failure_reconstruction
## Example row
| Signal | Example Value |
|---|---|
| infection_load | 0.84 |
| buffer_capacity | 0.30 |
| lag_burden | 0.66 |
| coupling_stress | 0.72 |
| failure_decision_sequence | delayed_antibiotics > infection_escalation > septic_instability |
| failure_path_length | 3 |
| cascade_amplification_factor | 0.78 |
| recovery_window_width | 0.21 |
| label_root_policy_error | delayed_antibiotics |
| label_counterfactual_recovery_step | early_antibiotics |
| label_failure_reconstruction | 1 |
## Files
The repository contains:
- data/train.csv
- data/tester.csv
- scorer.py
- benchmark_spec.json
- dataset_schema.json
- README.md
## Evaluation
Primary metric:
failure_reconstruction_accuracy
Secondary metric:
false_failure_reconstruction_rate
Binary metrics:
- accuracy
- precision
- recall
- f1
Diagnostics:
- high_amplification_chain_miss_rate
- narrow_recovery_window_miss_rate
- root_policy_error_miss_rate
## Benchmark protocol
Evaluation follows the standard Clarus benchmark protocol.
1. Train a model using `data/train.csv`.
2. Generate predictions for the tester dataset.
3. The submission file must contain the required prediction columns.
4. The scorer evaluates predictions using `scorer.py`.
5. The scorer produces a structured JSON evaluation report containing:
- binary classification metrics
- reconstruction accuracy metrics
- diagnostic failure metrics
Primary evaluation metric:
failure_reconstruction_accuracy
Secondary evaluation metric:
false_failure_reconstruction_rate
## Intended use
This dataset is intended for:
- causal failure reconstruction benchmarking
- policy accountability analysis
- post-collapse reasoning evaluation
- evaluation of structured reasoning in machine learning systems
## Limitations
This dataset evaluates structural causal reconstruction rather than full clinical fidelity.
Failure chains are benchmark artifacts representing modeled collapse dynamics.
They should not be interpreted as literal clinical treatment protocols.
This dataset is **not intended for direct clinical decision making**.
## Position in Clarus ladder
Clarus benchmark layers:
- v0.1 cascade state detection
- v0.2 trajectory detection
- v0.3 dynamic cascade forecasting
- v0.4 boundary discovery
- v0.5 recovery geometry
- v0.6 intervention pathway reasoning
- v0.7 uncertainty geometry
- v0.8 regime transition geometry
- v0.9 intervention competition geometry
- v1.0 closed-loop control geometry
- v1.1 adaptive policy stability
- v1.2 temporal memory geometry
- **v1.3 failure reconstruction geometry**
## License
MIT
提供机构:
ClarusC64



