masked-kunsiquat/shreevastava-cognitive-distortions
收藏Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/masked-kunsiquat/shreevastava-cognitive-distortions
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: mit
task_categories:
- text-classification
tags:
- cbt
- cognitive-behavioral-therapy
- cognitive-distortions
- mental-health
- text-classification
- multi-label-classification
configs:
- config_name: default
data_files:
- split: train
path:
- data/corpus.jsonl
- data/synth_disqualifying_positive.jsonl
- data/synth_blame.jsonl
- config_name: corpus
data_files:
- split: train
path: data/corpus.jsonl
- config_name: synth
data_files:
- split: train
path:
- data/synth_disqualifying_positive.jsonl
- data/synth_blame.jsonl
---
Multi-label cognitive distortion classification dataset used to train the
`DistortionMlp` head in the [Lattice](https://github.com/Masked-Kunsiquat/Lattice)
on-device CBT journaling app.
## Sources
| Config | File | Rows | Classes covered | Origin |
|---|---|---|---|---|
| `corpus` | `data/corpus.jsonl` | 2530 | 10 (all except `DISQUALIFYING_POSITIVE`, `BLAME`) | Shreevastava et al. therapy forum posts — human-annotated |
| `synth` | `data/synth_disqualifying_positive.jsonl` | 307 | `DISQUALIFYING_POSITIVE` | Claude-generated synthetic |
| `synth` | `data/synth_blame.jsonl` | 339 | `BLAME` | Claude-generated synthetic |
The `default` config combines all three (3176 rows total).
## Schema
Each row:
```json
{"text": "...", "labels": [false, true, false, ...]}
```
`labels` is a 12-element boolean array. Index → class mapping:
| Index | `CognitiveDistortion` | Burns label |
|---|---|---|
| 0 | `ALL_OR_NOTHING` | All-or-Nothing Thinking |
| 1 | `OVERGENERALIZATION` | Overgeneralization |
| 2 | `MENTAL_FILTER` | Mental Filter |
| 3 | `DISQUALIFYING_POSITIVE` | Disqualifying the Positive |
| 4 | `MIND_READING` | Mind Reading |
| 5 | `FORTUNE_TELLING` | Fortune-Telling |
| 6 | `CATASTROPHIZING` | Magnification / Catastrophizing |
| 7 | `EMOTIONAL_REASONING` | Emotional Reasoning |
| 8 | `SHOULD_STATEMENTS` | Should Statements |
| 9 | `LABELING` | Labeling |
| 10 | `PERSONALIZATION` | Personalization |
| 11 | `BLAME` | Blame |
Rows with all-false labels represent "No Distortion" examples.
Multi-label rows have more than one `true` entry.
## Corpus label mapping
The Shreevastava et al. corpus uses different label strings. The mapping applied
during ingestion (`DistortionCorpusMapper`):
| Corpus label | Index |
|---|---|
| All-or-nothing thinking | 0 |
| Overgeneralization | 1 |
| Mental filter | 2 |
| Mind Reading | 4 |
| Fortune-telling | 5 |
| Magnification | 6 |
| Emotional Reasoning | 7 |
| Should statements | 8 |
| Labeling | 9 |
| Personalization | 10 |
| No Distortion | all-zeros |
## Usage
```python
from datasets import load_dataset
# Full combined dataset
ds = load_dataset("masked-kunsiquat/shreevastava-cognitive-distortions")
# Corpus only
corpus = load_dataset("masked-kunsiquat/shreevastava-cognitive-distortions", "corpus")
# Check class balance
from collections import Counter
label_counts = Counter()
for row in ds["train"]:
for i, v in enumerate(row["labels"]):
if v:
label_counts[i] += 1
```
提供机构:
masked-kunsiquat



