AserLompo/khp-youth-mental-health-guardrail

Name: AserLompo/khp-youth-mental-health-guardrail
Creator: AserLompo
Published: 2026-03-26 20:36:59
License: 暂无描述

Hugging Face2026-03-26 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/AserLompo/khp-youth-mental-health-guardrail

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en - fr license: mit task_categories: - text-classification task_ids: - multi-label-classification tags: - mental-health - safety - youth - synthetic - guardrail - conversational - distress-detection pretty_name: KHP Youth Mental Health Safety Guardrail Dataset size_categories: - 1K<n<10K --- # KHP Youth Mental Health Safety Guardrail Dataset A synthetic multi-turn conversational dataset for training and evaluating input guardrails for AI assistants serving youth in mental distress. The dataset targets **9 distress signal categories** and provides a binary **high-risk label** for classification, decoupled from the individual signal flags. **GitHub repository:** [Aser97/Guardrail-For-Agents](https://github.com/Aser97/Guardrail-For-Agents) --- ## Dataset Summary | Split | Rows | High-risk | Low-risk | |-------|------|-----------|----------| | Train | ~1578 | ~789 | ~789 | | Validation | ~196 | ~98 | ~98 | | Test | ~200 | ~100 | ~100 | | **Total** | **1974** | **987** | **987** | The dataset is perfectly balanced (50/50 high-risk / low-risk) and covers **English**, **French**, and **mixed EN/FR** conversations. --- ## Motivation There are no existing datasets that combine: - Multi-turn conversational format (not single posts/messages) - Youth-specific register (age 13–17, informal language, slang) - Bilingual EN/FR coverage - Nine distinct distress signal categories with individual labels - A high-risk label explicitly decoupled from simple signal OR-logic This dataset was built from scratch using a combination of adversarial and augmentation techniques to fill that gap. --- ## Signal Taxonomy Each row carries 9 binary signal flags (`s_*` columns) alongside a global `label` field: | Column | Signal | Description | |--------|--------|-------------| | `s_burden_language` | Burden / Suicidal ideation | Expressions of being a burden, not wanting to exist | | `s_finality_language` | Finality language | Plans, timelines, "I can't do this anymore" | | `s_escape_framing` | Escape framing | Disappearance, death as coping, avoidance | | `s_hopelessness` | Hopelessness | Feeling worthless, trapped, numb, guilty | | `s_active_self_harm` | Active self-harm | Escalating frequency/severity; self-punishment | | `s_immediate_safety` | Immediate safety | Violence; unsafe environment; coercion | | `s_self_image_crisis` | Self-image crisis | Identity confusion; self-worth collapse | | `s_third_party_concern` | Third-party concern | Concern about a friend or family member's safety | | `s_testing` | Testing | Probing safety limits; repeated hesitant engagement | **Important:** `label = 1` (high-risk) is **not** equivalent to `any(s_* == 1)`. Signals can be present without warranting escalation (e.g. hypothetical framing, third-party concern without acute risk). The `label` field reflects a holistic assessment of whether the conversation requires urgent support. --- ## Data Generation The dataset was built using four complementary techniques: | Source tag | Technique | Risk level | |---|---|---| | `generate_scratch` | Single-model generation from structured prompt (Mistral / Llama) | High-risk | | `generate_camel` | CAMEL dual-agent role-play (Li et al., 2023) | High-risk | | `camel_hard_subtle` / `camel_hard_escalating` | CAMEL with enforced hardness tracks | High-risk | | `pair_hard_positive` / `pair_hard_subtle` | PAIR adversarial loop (Chao et al., 2023) | High-risk | | `pair_adversarial_negative` | PAIR hard negatives (mimicry without genuine signal) | Low-risk | | `esconv` | ESConv public dataset (Liu et al., 2021) — adult conversations rewritten for youth persona | Low-risk | | `esconv_swapped` | ESConv with mandatory persona swap (adult → youth) | Low-risk | | `lang_rewrite` | Augmentation: EN ↔ FR language rewrite | Low-risk | | `persona_swap` | Augmentation: demographic persona rewrite | Mixed | | `signal_inject_khp` / `signal_inject_esconv` | Augmentation: signal injection into low-risk seed | High-risk | | `signal_soften` | Augmentation: signal softening (borderline examples) | Low-risk | All generated rows were verified by a **Claude Sonnet** jury scoring realism, subtlety, and signal presence. --- ## Schema ```python { "text": str, # Full multi-turn conversation (user/assistant turns) "label": int, # 0 = low-risk, 1 = high-risk "source": str, # Generation technique (see table above) "primary_signal": str, # Main distress signal targeted during generation "escalation_stage": str, # Stage of signal escalation (where applicable) "register": str, # Language register / slang profile used "language": str, # "en", "fr", or "mix" "persona_id": str, # Persona profile identifier "category": str, # Thematic category of the stressor # Individual signal flags (0 or 1): "s_burden_language": int, "s_finality_language": int, "s_escape_framing": int, "s_hopelessness": int, "s_active_self_harm": int, "s_immediate_safety": int, "s_self_image_crisis": int, "s_third_party_concern": int, "s_testing": int, } ``` --- ## Usage ```python from datasets import load_dataset ds = load_dataset("Aser97/khp-youth-mental-health-guardrail") # Binary classification (high-risk / low-risk) train = ds["train"] print(train[0]["text"]) print("High-risk:", train[0]["label"]) # Multi-label signal detection print("Hopelessness:", train[0]["s_hopelessness"]) print("Immediate safety:", train[0]["s_immediate_safety"]) ``` --- ## Recommended Model The dataset was used to fine-tune **Qwen2.5-7B-Instruct** with LoRA (r=16, α=32) on a 9-head multi-label classification layer, followed by a logistic regression aggregation head over the 9 signal probabilities to produce the final high-risk binary prediction. See the [GitHub repository](https://github.com/Aser97/Guardrail-For-Agents) for training scripts and the full pipeline. --- ## Evaluation Evaluated on an external real-world youth mental health corpus (94 samples): | Metric | Score | |--------|-------| | Precision | 0.8222 | | Recall | 0.8810 | | F1 | 0.8506 | Confusion matrix: TP=37, FP=8, TN=44, FN=5. All 5 false negatives are qualitatively borderline cases where annotator agreement is low. --- ## Ethical Considerations - All conversations are **fully synthetic** — no real user data was used. - The dataset is intended for **safety classifier training only**, not for generating harmful content. - Signal categories and conversation content involve sensitive topics (self-harm, suicidal ideation, abuse). Researchers using this dataset should follow appropriate ethical review processes. - The high-risk label reflects a calibrated safety threshold, not a clinical diagnosis. --- ## Citation If you use this dataset, please cite: ```bibtex @dataset{lompo2026khp, author = {Lompo, Boammani Aser}, title = {{KHP Youth Mental Health Safety Guardrail Dataset}}, year = {2026}, publisher = {HuggingFace}, url = {https://huggingface.co/datasets/AserLompo/khp-youth-mental-health-guardrail} } ``` The following works were instrumental in the data generation pipeline: - Li et al. (2023). *CAMEL: Communicative Agents for "Mind" Exploration.* NeurIPS. [arXiv:2303.17760](https://arxiv.org/abs/2303.17760) - Chao et al. (2023). *Jailbreaking Black Box LLMs in Twenty Queries.* [arXiv:2310.08419](https://arxiv.org/abs/2310.08419) - Liu et al. (2021). *Towards Emotional Support Dialog Systems.* ACL 2021. [arXiv:2106.01144](https://arxiv.org/abs/2106.01144) --- ## License MIT — free to use for research and non-commercial applications. Please review ethical considerations above before use.

提供机构：

AserLompo

5,000+

优质数据集

54 个

任务类型

进入经典数据集