arnaiztech/llms-mental-health-crisis-responses
收藏Hugging Face2026-04-13 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/arnaiztech/llms-mental-health-crisis-responses
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Between Help and Harm - Responses and Evaluations
license: cc-by-nc-4.0
language:
- en
multilinguality: monolingual
annotations_creators:
- expert-generated
- machine-generated
language_creators:
- found
tags:
- mental-health
- safety
- llm-evaluation
- response-quality
- conversations
configs:
- config_name: responses
default: true
data_files:
- split: validation
path: responses/validation/*.parquet
- split: test
path: responses/test/*.parquet
- config_name: human_raw_scores
data_files:
- split: validation
path: evaluations/human_raw/validation/*.parquet
- config_name: sampled_review_sets
data_files:
- split: validation
path: evaluations/sampled_review_sets/*.parquet
- config_name: llm_raw_evaluations_gpt_4o_mini
data_files:
- split: validation
path: evaluations/llm_raw/gpt-4o-mini/validation/*.parquet
- split: test
path: evaluations/llm_raw/gpt-4o-mini/test/*.parquet
- config_name: llm_raw_evaluations_gpt_5_nano
data_files:
- split: validation
path: evaluations/llm_raw/gpt-5-nano/validation/*.parquet
- config_name: llm_raw_evaluations_llama_4_scout_17b_16e_instruct
data_files:
- split: validation
path: evaluations/llm_raw/llama-4-scout-17b-16e-instruct/validation/*.parquet
- config_name: llm_merged_evaluations_gpt_4o_mini
data_files:
- split: validation
path: evaluations/llm_merged/gpt-4o-mini/validation/*.parquet
- split: test
path: evaluations/llm_merged/gpt-4o-mini/test/*.parquet
- config_name: llm_merged_evaluations_gpt_5_nano
data_files:
- split: validation
path: evaluations/llm_merged/gpt-5-nano/validation/*.parquet
- config_name: llm_merged_evaluations_llama_4_scout_17b_16e_instruct
data_files:
- split: validation
path: evaluations/llm_merged/llama-4-scout-17b-16e-instruct/validation/*.parquet
---
# Dataset Card for Between Help and Harm - Responses and Evaluations
## Dataset Summary
This dataset repo contains the response-side artifacts prepared for Hugging Face from the paper *Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs*, published in *JMIR Mental Health*.
If you use this dataset, please cite the paper. The citation is included below, the arXiv version is available at <https://arxiv.org/abs/2509.24857>, and the final DOI is allocated as <https://doi.org/10.2196/88435>.
It is organized around the response-level evaluation task:
- `responses/`: model responses aligned to canonical `example_id`
- `evaluations/human_raw/`: anonymized human appropriateness scores on the validation subset
- `evaluations/llm_raw/`: raw evaluator-model judgments on model responses
- `evaluations/llm_merged/`: merged evaluator outputs with per-response aggregate statistics
- `evaluations/sampled_review_sets/`: sampled low-score review subsets used for agreement analysis
The export preserves the real source JSON structure where possible and adds deterministic `response_id` and `evaluation_id` fields so the tables can be joined reliably.
## Repository Structure
```text
.
├── README.md
├── responses/
│ ├── validation/
│ └── test/
├── evaluations/
│ ├── human_raw/
│ ├── llm_raw/
│ ├── llm_merged/
│ └── sampled_review_sets/
├── metadata/
│ ├── export_summary.json
│ ├── normalization.json
│ └── provenance.json
└── checksums/
└── sha256_manifest.json
```
## Included Configurations
- `responses`: all response tables for validation and test
- `human_raw_scores`: anonymized human appropriateness scores on the validation subset
- `sampled_review_sets`: sampled evaluator subsets used for manual comparison
- `llm_raw_evaluations_*`: raw evaluator runs grouped by evaluator model
- `llm_merged_evaluations_*`: merged evaluator outputs grouped by evaluator model
## Row Counts
- each validation response file: 618 rows
- each test response file: 6,138 rows
- each validation raw evaluator file: 1,854 rows
- each test raw evaluator file under `gpt-4o-mini`: 18,414 rows
- each merged validation evaluator file: 618 rows
- each merged test evaluator file under `gpt-4o-mini`: 6,138 rows
- `evaluations/human_raw/validation/H1.parquet`: 206 rows
- `evaluations/human_raw/validation/H2.parquet`: 206 rows
- each sampled review file: 206 rows
## Schema Notes
Key columns include:
- `response_id`: deterministic export ID for one model response to one example
- `example_id`: foreign key to the benchmark repo
- `answer_model_raw` / `answer_model`
- `evaluator_model_raw` / `evaluator_model`
- `response_text`
- `score`, `score_runs`, `score_std`
- `source_*` columns for provenance
Some source JSON rows stored answer payloads or explanations as non-string objects. Those were serialized safely into strings for Parquet compatibility while preserving the original content.
## Loading Examples
```python
from datasets import load_dataset
responses = load_dataset("your-username/llms-mental-health-crisis-responses", "responses")
human_scores = load_dataset("your-username/llms-mental-health-crisis-responses", "human_raw_scores")
raw_eval = load_dataset("your-username/llms-mental-health-crisis-responses", "llm_raw_evaluations_gpt_4o_mini")
```
## Intended Uses
This repo is intended for:
- analysis of LLM responses to crisis-related conversations
- appropriateness and agreement studies
- evaluator-model comparison
- reproducing the response-quality analyses in the paper
It is not intended to be used as direct therapeutic advice or as a substitute for clinical review.
## Provenance
- exported from the original repository JSON files
- joins validated against `inputs` and, where needed, answer text
- integrity metadata stored in `metadata/provenance.json`
- per-file digests stored in `checksums/sha256_manifest.json`
## Citation
```bibtex
@article{arnaiz2026between,
author = {Arnaiz-Rodriguez, Adrian and Baidal, M. and Derner, E. and Annable, J. L. and Ball, M. and Ince, M. and Perez Vallejos, E. and Oliver, N.},
title = {Between Help and Harm: An Evaluation of Mental Health Crisis Handling by {LLMs}},
journal = {JMIR Mental Health},
year = {2026},
volume = {forthcoming},
pages = {88435},
doi = {10.2196/88435},
url = {https://arxiv.org/abs/2509.24857},
note = {In press}
}
```
## License
This dataset is released under the Creative Commons Attribution-NonCommercial 4.0 International license (`cc-by-nc-4.0`).
提供机构:
arnaiztech



