aditijc/dhf-smoke-canary

Name: aditijc/dhf-smoke-canary
Creator: aditijc
Published: 2026-04-27 17:04:58
License: 暂无描述

Hugging Face2026-04-27 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/aditijc/dhf-smoke-canary

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit tags: - disentangled-health-futures - mimic-iv - mlm - canary --- # dhf-smoke-canary Phase 1.6a smoke-canary: validates new interpretability surface. eval_mlm 3x8=24 windows; viz_mask_reconstruct 2 cases; dump_embeddings 100 windows; retrieval_sanity 3 queries. Confirms GCS-total dominates 18% of masked tokens; model value-head outputs cluster near training mean (1.9 GCS) — confirming value encoder is uninformative. ## Dataset Info - **Rows**: 16 - **Columns**: 13 ## Columns | Column | Type | Description | |--------|------|-------------| | code_id | Value('string') | Integer vocab ID for the masked code | | token | Value('string') | Raw vocab string (matches code_vocab.csv; 'Glascow' typo preserved) | | label | Value('string') | Plain-English label with units and clinical reference range | | units | Value('string') | Measurement units (bpm, mmHg, °C, %, etc.) | | n | Value('string') | Number of masked tokens of this code in the eval (out of total `summary.metrics.none.masked`) | | top1 | Value('string') | Count of correct top-1 predictions | | top5 | Value('string') | Count of correct top-5 predictions | | top1_acc | Value('string') | top1 / n | | top5_acc | Value('string') | top5 / n | | value_pred_mean_human_units | Value('string') | Mean predicted value reverse-z'd to clinical units (using --value_stats_path) | | value_true_mean_human_units | Value('string') | Mean true value reverse-z'd to clinical units | | value_mse_human_units | Value('string') | MSE between predicted and true value, in clinical units squared | | pct_out_of_range | Value('string') | Percentage of value predictions falling outside the per-code clip thresholds | ## Generation Parameters ```json { "script_name": "scripts/upload_eval_artifacts.py", "model": "mlm_baseline.pt", "description": "Phase 1.6a smoke-canary: validates new interpretability surface. eval_mlm 3x8=24 windows; viz_mask_reconstruct 2 cases; dump_embeddings 100 windows; retrieval_sanity 3 queries. Confirms GCS-total dominates 18% of masked tokens; model value-head outputs cluster near training mean (1.9 GCS) \u2014 confirming value encoder is uninformative.", "experiment_name": "disentangled-health-futures", "cluster": "torch", "artifact_status": "partial", "canary": true, "value_stats_path": "mimic_datasets/mimic_iv/3.1/processed/code_value_stats_pre_zero_filter.csv", "clip_thresholds_path": "mimic_datasets/mimic_iv/3.1/processed/code_clip_thresholds_pre_zero_filter.csv", "split": "val", "batches": 5, "batch_size": 16, "ablation_summary": [ { "mode": "none", "masked": "2792", "loss_ce": "2.337382756536533", "top1": "0.17765042979942694", "top5": "0.6608166189111748", "mean_p_true": "0.11430661657819775" }, { "mode": "no_value", "masked": "2708", "loss_ce": "2.3109693879390925", "top1": "0.18500738552437224", "top5": "0.6698670605612999", "mean_p_true": "0.11474200717807696" }, { "mode": "no_dt", "masked": "2695", "loss_ce": "2.3586271324936225", "top1": "0.1699443413729128", "top5": "0.660482374768089", "mean_p_true": "0.11313330627328168" }, { "mode": "no_diag", "masked": "2631", "loss_ce": "2.31371703776784", "top1": "0.19080197643481567", "top5": "0.6894716837704294", "mean_p_true": "0.11599155108496428" }, { "mode": "no_value_dt", "masked": "2725", "loss_ce": "2.349891861381881", "top1": "0.1669724770642202", "top5": "0.658348623853211", "mean_p_true": "0.11251883550521431" } ], "ablation_metrics": { "none": { "masked": 2792, "loss_ce": 2.337382756536533, "top1": 0.17765042979942694, "top5": 0.6608166189111748, "mean_p_true": 0.11430661657819775 }, "no_value": { "masked": 2708, "loss_ce": 2.3109693879390925, "top1": 0.18500738552437224, "top5": 0.6698670605612999, "mean_p_true": 0.11474200717807696 }, "no_dt": { "masked": 2695, "loss_ce": 2.3586271324936225, "top1": 0.1699443413729128, "top5": 0.660482374768089, "mean_p_true": 0.11313330627328168 }, "no_diag": { "masked": 2631, "loss_ce": 2.31371703776784, "top1": 0.19080197643481567, "top5": 0.6894716837704294, "mean_p_true": 0.11599155108496428 }, "no_value_dt": { "masked": 2725, "loss_ce": 2.349891861381881, "top1": 0.1669724770642202, "top5": 0.658348623853211, "mean_p_true": 0.11251883550521431 } }, "baseline_most_frequent": { "code_id": 8, "token": "Glascow coma scale total", "label": "Glasgow Coma Scale (total) (0-15, normal 3-15)", "freq": 510, "accuracy": 0.1826647564469914 }, "hyperparameters": {}, "input_datasets": [] } ``` ## Usage ```python from datasets import load_dataset dataset = load_dataset("aditijc/dhf-smoke-canary", split="train") print(f"Loaded {len(dataset)} rows") ``` ---

提供机构：

aditijc

5,000+

优质数据集

54 个

任务类型

进入经典数据集