arnaiztech/llms-mental-health-crisis-responses

Name: arnaiztech/llms-mental-health-crisis-responses
Creator: arnaiztech
Published: 2026-04-13 12:36:27
License: 暂无描述

Hugging Face2026-04-13 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/arnaiztech/llms-mental-health-crisis-responses

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: Between Help and Harm - Responses and Evaluations license: cc-by-nc-4.0 language: - en multilinguality: monolingual annotations_creators: - expert-generated - machine-generated language_creators: - found tags: - mental-health - safety - llm-evaluation - response-quality - conversations configs: - config_name: responses default: true data_files: - split: validation path: responses/validation/*.parquet - split: test path: responses/test/*.parquet - config_name: human_raw_scores data_files: - split: validation path: evaluations/human_raw/validation/*.parquet - config_name: sampled_review_sets data_files: - split: validation path: evaluations/sampled_review_sets/*.parquet - config_name: llm_raw_evaluations_gpt_4o_mini data_files: - split: validation path: evaluations/llm_raw/gpt-4o-mini/validation/*.parquet - split: test path: evaluations/llm_raw/gpt-4o-mini/test/*.parquet - config_name: llm_raw_evaluations_gpt_5_nano data_files: - split: validation path: evaluations/llm_raw/gpt-5-nano/validation/*.parquet - config_name: llm_raw_evaluations_llama_4_scout_17b_16e_instruct data_files: - split: validation path: evaluations/llm_raw/llama-4-scout-17b-16e-instruct/validation/*.parquet - config_name: llm_merged_evaluations_gpt_4o_mini data_files: - split: validation path: evaluations/llm_merged/gpt-4o-mini/validation/*.parquet - split: test path: evaluations/llm_merged/gpt-4o-mini/test/*.parquet - config_name: llm_merged_evaluations_gpt_5_nano data_files: - split: validation path: evaluations/llm_merged/gpt-5-nano/validation/*.parquet - config_name: llm_merged_evaluations_llama_4_scout_17b_16e_instruct data_files: - split: validation path: evaluations/llm_merged/llama-4-scout-17b-16e-instruct/validation/*.parquet --- # Dataset Card for Between Help and Harm - Responses and Evaluations ## Dataset Summary This dataset repo contains the response-side artifacts prepared for Hugging Face from the paper *Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs*, published in *JMIR Mental Health*. If you use this dataset, please cite the paper. The citation is included below, the arXiv version is available at <https://arxiv.org/abs/2509.24857>, and the final DOI is allocated as <https://doi.org/10.2196/88435>. It is organized around the response-level evaluation task: - `responses/`: model responses aligned to canonical `example_id` - `evaluations/human_raw/`: anonymized human appropriateness scores on the validation subset - `evaluations/llm_raw/`: raw evaluator-model judgments on model responses - `evaluations/llm_merged/`: merged evaluator outputs with per-response aggregate statistics - `evaluations/sampled_review_sets/`: sampled low-score review subsets used for agreement analysis The export preserves the real source JSON structure where possible and adds deterministic `response_id` and `evaluation_id` fields so the tables can be joined reliably. ## Repository Structure ```text . ├── README.md ├── responses/ │ ├── validation/ │ └── test/ ├── evaluations/ │ ├── human_raw/ │ ├── llm_raw/ │ ├── llm_merged/ │ └── sampled_review_sets/ ├── metadata/ │ ├── export_summary.json │ ├── normalization.json │ └── provenance.json └── checksums/ └── sha256_manifest.json ``` ## Included Configurations - `responses`: all response tables for validation and test - `human_raw_scores`: anonymized human appropriateness scores on the validation subset - `sampled_review_sets`: sampled evaluator subsets used for manual comparison - `llm_raw_evaluations_*`: raw evaluator runs grouped by evaluator model - `llm_merged_evaluations_*`: merged evaluator outputs grouped by evaluator model ## Row Counts - each validation response file: 618 rows - each test response file: 6,138 rows - each validation raw evaluator file: 1,854 rows - each test raw evaluator file under `gpt-4o-mini`: 18,414 rows - each merged validation evaluator file: 618 rows - each merged test evaluator file under `gpt-4o-mini`: 6,138 rows - `evaluations/human_raw/validation/H1.parquet`: 206 rows - `evaluations/human_raw/validation/H2.parquet`: 206 rows - each sampled review file: 206 rows ## Schema Notes Key columns include: - `response_id`: deterministic export ID for one model response to one example - `example_id`: foreign key to the benchmark repo - `answer_model_raw` / `answer_model` - `evaluator_model_raw` / `evaluator_model` - `response_text` - `score`, `score_runs`, `score_std` - `source_*` columns for provenance Some source JSON rows stored answer payloads or explanations as non-string objects. Those were serialized safely into strings for Parquet compatibility while preserving the original content. ## Loading Examples ```python from datasets import load_dataset responses = load_dataset("your-username/llms-mental-health-crisis-responses", "responses") human_scores = load_dataset("your-username/llms-mental-health-crisis-responses", "human_raw_scores") raw_eval = load_dataset("your-username/llms-mental-health-crisis-responses", "llm_raw_evaluations_gpt_4o_mini") ``` ## Intended Uses This repo is intended for: - analysis of LLM responses to crisis-related conversations - appropriateness and agreement studies - evaluator-model comparison - reproducing the response-quality analyses in the paper It is not intended to be used as direct therapeutic advice or as a substitute for clinical review. ## Provenance - exported from the original repository JSON files - joins validated against `inputs` and, where needed, answer text - integrity metadata stored in `metadata/provenance.json` - per-file digests stored in `checksums/sha256_manifest.json` ## Citation ```bibtex @article{arnaiz2026between, author = {Arnaiz-Rodriguez, Adrian and Baidal, M. and Derner, E. and Annable, J. L. and Ball, M. and Ince, M. and Perez Vallejos, E. and Oliver, N.}, title = {Between Help and Harm: An Evaluation of Mental Health Crisis Handling by {LLMs}}, journal = {JMIR Mental Health}, year = {2026}, volume = {forthcoming}, pages = {88435}, doi = {10.2196/88435}, url = {https://arxiv.org/abs/2509.24857}, note = {In press} } ``` ## License This dataset is released under the Creative Commons Attribution-NonCommercial 4.0 International license (`cc-by-nc-4.0`).

提供机构：

arnaiztech

5,000+

优质数据集

54 个

任务类型

进入经典数据集