dzur658/therapy-conversations-full-small
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/dzur658/therapy-conversations-full-small
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- token-classification
language:
- en
tags:
- medical
size_categories:
- 1K<n<10K
---
# Therapy Conversations Full Small
## About the dataset
This dataset contains 3161 unique, synthetically generated examples, of multi-turn conversations between a patient and a therapist. Minimax M 2.5 was used to generate
the transcripts, but there is also associated meta data attached.
## Understanding an Object in the Dataset
#### Top Level Keys
- `conversation` contains the particular conversation in json format
- `fingerprint` contains metadata about the conversation (demographic, personality traits of both parties, etc)
- `full text` contains the full text of the transcript in the way it woul be ingested into the [Knowledge graph genrator API](https://github.com/dzur658/therapy-bert/blob/main/knowledge_graph_api.py)
- `knowledge_graph_shard` contains the `entities` and `relations` MiniMax M 2.5 extracted
#### `conversation`
`conversation` contains a `turns` object. The `turns` object contains sub-objects that
are each "turn" of the conversation. Each turn is made up of 2 keys `speaker` and `text`.
Use this if your trying to inspect lines individually within a conversation, it is pre-parsed for you
#### `fingerprint`
`fingerprint` contains the following metadata about the conversation:
- `age`: 18-75
- `gender`: male, female, transgender woman, transgender man, non-binary
- `occupation`: software engineer, teacher, nurse, artist, salesperson,
retired, student, unemployed, entrepreneur, stay-at-home parent
`presenting_issue`: anxiety, depression, relationship issues, work stress,
grief, self-esteem issues, sexuality issues, trauma, substance abuse,
eating disorders, chronic illness, gender dysphoria, identity issues,
family conflict, life transitions, body dysmorphia, obsessive-compulsive disorder,
phobias, sleep disorders, anger management issues
- `relationship_status`: single, in a relationship, married, divorced, widowed
- `living_situation`: living alone, living with family, living with roommates, living with partner, living in a group home, living in a shelter
- `therapy_modality`: cognitive-behavioral therapy, psychodynamic therapy, humanistic therapy,
integrative therapy, mindfulness-based therapy, art therapy,
dialectical behavior therapy, acceptance and commitment therapy,
eye movement desensitization and reprocessing (EMDR), exposure therapy
- `patient_speaking_style`: verbose, concise, emotional, logical, intellectualizing, narrative,
disorganized, rambling, focused, tangential, metaphorical, literal
- `therapist_speaking_style`: empathetic, direct, analytical, supportive, challenging,
reflective, encouraging, neutral, collaborative, authoritative,
offensive, dismissive, condescending, patronizing, invalidating
- `session`: 1-20 (how many sessions the patient and therapists have been seeing each other)
- `conversation_length`: the requested conversation length (although this most likely does not match as it was not strictly enforced during generation)
许可证:Apache-2.0
任务类别:令牌分类(token-classification)
语言:英语
标签:医疗
样本规模区间:1000 < n < 10000
# 完整小型治疗对话数据集(Therapy Conversations Full Small)
## 数据集概况
本数据集包含3161条独特的合成生成样例,均为患者与心理治疗师之间的多轮对话。数据集转录稿由Minimax M 2.5生成,同时附带相关元数据。
## 数据集对象说明
### 顶层键值
- `conversation`:以JSON格式存储的单条具体对话内容
- `fingerprint`:包含对话相关元数据(如双方人口统计学信息、人格特质等)
- `full text`:转录稿的完整文本,可直接输入至[知识图谱生成器API](https://github.com/dzur658/therapy-bert/blob/main/knowledge_graph_api.py)
- `knowledge_graph_shard`:包含MiniMax M 2.5提取的`实体(entities)`与`关系(relations)`
### `conversation`字段
该字段包含一个`turns`对象,`turns`对象由若干对话轮次的子对象组成。每一轮对话均包含`speaker`(发言者)与`text`(发言内容)两个键值对。若需单独查看对话中的单条语句,可直接调用该字段,其内容已预先完成解析。
### `fingerprint`字段
该字段包含对话的如下元数据:
- 年龄:18-75岁
- 性别:男性、女性、跨性别女性、跨性别男性、非二元性别
- 职业:软件工程师、教师、护士、艺术家、销售人员、退休人员、学生、失业人员、创业者、全职家长
- 就诊主诉:焦虑、抑郁、人际关系问题、工作压力、悲伤情绪、自尊问题、性取向问题、创伤经历、物质滥用、进食障碍、慢性疾病、性别焦虑、身份认同问题、家庭冲突、人生转折、躯体变形障碍、强迫症、恐惧症、睡眠障碍、愤怒管理问题
- 婚恋状况:单身、恋爱中、已婚、离异、丧偶
- 居住情况:独居、与家人同住、与室友同住、与伴侣同住、集体宿舍、收容所
- 治疗取向:认知行为疗法、精神动力学疗法、人本主义疗法、整合疗法、正念疗法、艺术疗法、辩证行为疗法、接纳与承诺疗法、眼动脱敏与再加工疗法(EMDR)、暴露疗法
- 患者发言风格:冗长型、简洁型、情绪化型、逻辑型、理智化型、叙事型、混乱型、絮叨型、专注型、离题型、隐喻型、直白型
- 治疗师发言风格:共情型、直接型、分析型、支持型、挑战型、反思型、鼓励型、中立型、协作型、权威型、冒犯型、漠视型、居高临下型、傲慢施恩型、否定感受型
- 会谈次数:1-20(表示患者与治疗师已进行的会谈总次数)
- 对话长度:预设的对话长度(由于生成过程中未严格执行预设限制,实际长度可能与预设值存在偏差)
提供机构:
dzur658



