AserLompo/khp-youth-mental-health-guardrail
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AserLompo/khp-youth-mental-health-guardrail
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
- fr
license: mit
task_categories:
- text-classification
task_ids:
- multi-label-classification
tags:
- mental-health
- safety
- youth
- synthetic
- guardrail
- conversational
- distress-detection
pretty_name: KHP Youth Mental Health Safety Guardrail Dataset
size_categories:
- 1K<n<10K
---
# KHP Youth Mental Health Safety Guardrail Dataset
A synthetic multi-turn conversational dataset for training and evaluating input guardrails for AI assistants serving youth in mental distress. The dataset targets **9 distress signal categories** and provides a binary **high-risk label** for classification, decoupled from the individual signal flags.
**GitHub repository:** [Aser97/Guardrail-For-Agents](https://github.com/Aser97/Guardrail-For-Agents)
---
## Dataset Summary
| Split | Rows | High-risk | Low-risk |
|-------|------|-----------|----------|
| Train | ~1578 | ~789 | ~789 |
| Validation | ~196 | ~98 | ~98 |
| Test | ~200 | ~100 | ~100 |
| **Total** | **1974** | **987** | **987** |
The dataset is perfectly balanced (50/50 high-risk / low-risk) and covers **English**, **French**, and **mixed EN/FR** conversations.
---
## Motivation
There are no existing datasets that combine:
- Multi-turn conversational format (not single posts/messages)
- Youth-specific register (age 13–17, informal language, slang)
- Bilingual EN/FR coverage
- Nine distinct distress signal categories with individual labels
- A high-risk label explicitly decoupled from simple signal OR-logic
This dataset was built from scratch using a combination of adversarial and augmentation techniques to fill that gap.
---
## Signal Taxonomy
Each row carries 9 binary signal flags (`s_*` columns) alongside a global `label` field:
| Column | Signal | Description |
|--------|--------|-------------|
| `s_burden_language` | Burden / Suicidal ideation | Expressions of being a burden, not wanting to exist |
| `s_finality_language` | Finality language | Plans, timelines, "I can't do this anymore" |
| `s_escape_framing` | Escape framing | Disappearance, death as coping, avoidance |
| `s_hopelessness` | Hopelessness | Feeling worthless, trapped, numb, guilty |
| `s_active_self_harm` | Active self-harm | Escalating frequency/severity; self-punishment |
| `s_immediate_safety` | Immediate safety | Violence; unsafe environment; coercion |
| `s_self_image_crisis` | Self-image crisis | Identity confusion; self-worth collapse |
| `s_third_party_concern` | Third-party concern | Concern about a friend or family member's safety |
| `s_testing` | Testing | Probing safety limits; repeated hesitant engagement |
**Important:** `label = 1` (high-risk) is **not** equivalent to `any(s_* == 1)`. Signals can be present without warranting escalation (e.g. hypothetical framing, third-party concern without acute risk). The `label` field reflects a holistic assessment of whether the conversation requires urgent support.
---
## Data Generation
The dataset was built using four complementary techniques:
| Source tag | Technique | Risk level |
|---|---|---|
| `generate_scratch` | Single-model generation from structured prompt (Mistral / Llama) | High-risk |
| `generate_camel` | CAMEL dual-agent role-play (Li et al., 2023) | High-risk |
| `camel_hard_subtle` / `camel_hard_escalating` | CAMEL with enforced hardness tracks | High-risk |
| `pair_hard_positive` / `pair_hard_subtle` | PAIR adversarial loop (Chao et al., 2023) | High-risk |
| `pair_adversarial_negative` | PAIR hard negatives (mimicry without genuine signal) | Low-risk |
| `esconv` | ESConv public dataset (Liu et al., 2021) — adult conversations rewritten for youth persona | Low-risk |
| `esconv_swapped` | ESConv with mandatory persona swap (adult → youth) | Low-risk |
| `lang_rewrite` | Augmentation: EN ↔ FR language rewrite | Low-risk |
| `persona_swap` | Augmentation: demographic persona rewrite | Mixed |
| `signal_inject_khp` / `signal_inject_esconv` | Augmentation: signal injection into low-risk seed | High-risk |
| `signal_soften` | Augmentation: signal softening (borderline examples) | Low-risk |
All generated rows were verified by a **Claude Sonnet** jury scoring realism, subtlety, and signal presence.
---
## Schema
```python
{
"text": str, # Full multi-turn conversation (user/assistant turns)
"label": int, # 0 = low-risk, 1 = high-risk
"source": str, # Generation technique (see table above)
"primary_signal": str, # Main distress signal targeted during generation
"escalation_stage": str, # Stage of signal escalation (where applicable)
"register": str, # Language register / slang profile used
"language": str, # "en", "fr", or "mix"
"persona_id": str, # Persona profile identifier
"category": str, # Thematic category of the stressor
# Individual signal flags (0 or 1):
"s_burden_language": int,
"s_finality_language": int,
"s_escape_framing": int,
"s_hopelessness": int,
"s_active_self_harm": int,
"s_immediate_safety": int,
"s_self_image_crisis": int,
"s_third_party_concern": int,
"s_testing": int,
}
```
---
## Usage
```python
from datasets import load_dataset
ds = load_dataset("Aser97/khp-youth-mental-health-guardrail")
# Binary classification (high-risk / low-risk)
train = ds["train"]
print(train[0]["text"])
print("High-risk:", train[0]["label"])
# Multi-label signal detection
print("Hopelessness:", train[0]["s_hopelessness"])
print("Immediate safety:", train[0]["s_immediate_safety"])
```
---
## Recommended Model
The dataset was used to fine-tune **Qwen2.5-7B-Instruct** with LoRA (r=16, α=32) on a 9-head multi-label classification layer, followed by a logistic regression aggregation head over the 9 signal probabilities to produce the final high-risk binary prediction.
See the [GitHub repository](https://github.com/Aser97/Guardrail-For-Agents) for training scripts and the full pipeline.
---
## Evaluation
Evaluated on an external real-world youth mental health corpus (94 samples):
| Metric | Score |
|--------|-------|
| Precision | 0.8222 |
| Recall | 0.8810 |
| F1 | 0.8506 |
Confusion matrix: TP=37, FP=8, TN=44, FN=5. All 5 false negatives are qualitatively borderline cases where annotator agreement is low.
---
## Ethical Considerations
- All conversations are **fully synthetic** — no real user data was used.
- The dataset is intended for **safety classifier training only**, not for generating harmful content.
- Signal categories and conversation content involve sensitive topics (self-harm, suicidal ideation, abuse). Researchers using this dataset should follow appropriate ethical review processes.
- The high-risk label reflects a calibrated safety threshold, not a clinical diagnosis.
---
## Citation
If you use this dataset, please cite:
```bibtex
@dataset{lompo2026khp,
author = {Lompo, Boammani Aser},
title = {{KHP Youth Mental Health Safety Guardrail Dataset}},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/datasets/AserLompo/khp-youth-mental-health-guardrail}
}
```
The following works were instrumental in the data generation pipeline:
- Li et al. (2023). *CAMEL: Communicative Agents for "Mind" Exploration.* NeurIPS. [arXiv:2303.17760](https://arxiv.org/abs/2303.17760)
- Chao et al. (2023). *Jailbreaking Black Box LLMs in Twenty Queries.* [arXiv:2310.08419](https://arxiv.org/abs/2310.08419)
- Liu et al. (2021). *Towards Emotional Support Dialog Systems.* ACL 2021. [arXiv:2106.01144](https://arxiv.org/abs/2106.01144)
---
## License
MIT — free to use for research and non-commercial applications. Please review ethical considerations above before use.
提供机构:
AserLompo



