masked-kunsiquat/shreevastava-cognitive-distortions

Name: masked-kunsiquat/shreevastava-cognitive-distortions
Creator: masked-kunsiquat
Published: 2026-04-17 20:28:03
License: 暂无描述

Hugging Face2026-04-17 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/masked-kunsiquat/shreevastava-cognitive-distortions

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: mit task_categories: - text-classification tags: - cbt - cognitive-behavioral-therapy - cognitive-distortions - mental-health - text-classification - multi-label-classification configs: - config_name: default data_files: - split: train path: - data/corpus.jsonl - data/synth_disqualifying_positive.jsonl - data/synth_blame.jsonl - config_name: corpus data_files: - split: train path: data/corpus.jsonl - config_name: synth data_files: - split: train path: - data/synth_disqualifying_positive.jsonl - data/synth_blame.jsonl --- Multi-label cognitive distortion classification dataset used to train the `DistortionMlp` head in the [Lattice](https://github.com/Masked-Kunsiquat/Lattice) on-device CBT journaling app. ## Sources | Config | File | Rows | Classes covered | Origin | |---|---|---|---|---| | `corpus` | `data/corpus.jsonl` | 2530 | 10 (all except `DISQUALIFYING_POSITIVE`, `BLAME`) | Shreevastava et al. therapy forum posts — human-annotated | | `synth` | `data/synth_disqualifying_positive.jsonl` | 307 | `DISQUALIFYING_POSITIVE` | Claude-generated synthetic | | `synth` | `data/synth_blame.jsonl` | 339 | `BLAME` | Claude-generated synthetic | The `default` config combines all three (3176 rows total). ## Schema Each row: ```json {"text": "...", "labels": [false, true, false, ...]} ``` `labels` is a 12-element boolean array. Index → class mapping: | Index | `CognitiveDistortion` | Burns label | |---|---|---| | 0 | `ALL_OR_NOTHING` | All-or-Nothing Thinking | | 1 | `OVERGENERALIZATION` | Overgeneralization | | 2 | `MENTAL_FILTER` | Mental Filter | | 3 | `DISQUALIFYING_POSITIVE` | Disqualifying the Positive | | 4 | `MIND_READING` | Mind Reading | | 5 | `FORTUNE_TELLING` | Fortune-Telling | | 6 | `CATASTROPHIZING` | Magnification / Catastrophizing | | 7 | `EMOTIONAL_REASONING` | Emotional Reasoning | | 8 | `SHOULD_STATEMENTS` | Should Statements | | 9 | `LABELING` | Labeling | | 10 | `PERSONALIZATION` | Personalization | | 11 | `BLAME` | Blame | Rows with all-false labels represent "No Distortion" examples. Multi-label rows have more than one `true` entry. ## Corpus label mapping The Shreevastava et al. corpus uses different label strings. The mapping applied during ingestion (`DistortionCorpusMapper`): | Corpus label | Index | |---|---| | All-or-nothing thinking | 0 | | Overgeneralization | 1 | | Mental filter | 2 | | Mind Reading | 4 | | Fortune-telling | 5 | | Magnification | 6 | | Emotional Reasoning | 7 | | Should statements | 8 | | Labeling | 9 | | Personalization | 10 | | No Distortion | all-zeros | ## Usage ```python from datasets import load_dataset # Full combined dataset ds = load_dataset("masked-kunsiquat/shreevastava-cognitive-distortions") # Corpus only corpus = load_dataset("masked-kunsiquat/shreevastava-cognitive-distortions", "corpus") # Check class balance from collections import Counter label_counts = Counter() for row in ds["train"]: for i, v in enumerate(row["labels"]): if v: label_counts[i] += 1 ```

提供机构：

masked-kunsiquat

5,000+

优质数据集

54 个

任务类型

进入经典数据集