Jingy2000/multi-turn-counsel-chat
收藏Hugging Face2024-04-17 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/Jingy2000/multi-turn-counsel-chat
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
size_categories:
- n<1K
---
# Multi-turn Counsel chat dataset
<!-- Provide a quick summary of the dataset. -->
Convert the scrape of Counselchat.com's forum's question and answer into multi-turn conversation data. The original dataset is [here](https://huggingface.co/datasets/nbertagnolli/counsel-chat)
We use gpt-4-0125-preview to convert the top upvoted answer for every questions in the counsel chat dataset into multi-turn conversations.
## Dataset Details
- **Language(s) (NLP):** English
### Dataset Sources [optional]
<!-- Provide the basic links for the dataset. -->
CounselChat.com
[CouselChat dataset](https://huggingface.co/datasets/nbertagnolli/counsel-chat)
Thanks to:
Bertagnolli, N. Counsel Chat: Bootstrapping High-Quality Therapy Data. Available online: https://towardsdatascience.com/counsel-chat-bootstrapping-high-quality-therapy-data-971b419f33da
```
@misc{bertagnolli2020counsel,
title={Counsel chat: Bootstrapping high-quality therapy data},
author={Bertagnolli, Nicolas},
year={2020},
publisher={Towards Data Science. https://towardsdatascience. com/counsel-chat~…}
}
```
## Dataset Structure
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
There are 2 files in the dataset.
### all_dialogue_cleaned.json
- questionText: The body of the individual’s question to counselors
- answerText: The therapist response to the question
- messages: A list of messages between client and counselor
- role: Either client or counselor, alternate counselor/client/counselor/client...
- content: The body of the message
### all_dialogue_llama_json
Only containing a list of messages.
- role: Either client or counselor, alternate counselor/client/counselor/client...
- content: The body of the message
## Dataset Creation
We use gpt-4-0125-preview to convert the top upvoted answer for every questions in the counsel chat dataset into multi-turn conversations.
More detailed can be find on out github repo: [AITherapist](https://github.com/Jingy2000/AITherapist)
### Curation Rationale
There is a lack of high quality open source mental health data available for study in NLP.
Most datasets contains single turn conversation data
This dataset seeks to help bridge that gap and provide some additional data of counselors interacting with patients in need.
### Personal and Sensitive Information
This data is not anonymized, so individuals' names can be found in the dataset. CounselChat.com allows therapists to advertise for their clinics by providing sound publicly available advise. The therapist names have been kept as part of the original dataset.
## Bias, Risks, and Limitations
This dataset is generated from gpt-4-0125-preview. After counseling with a PhD in Counseling Psychology, they said the conversation in the dataset is not exact same as the real counseling situation.
We may improve this in the future.
## Dataset Card Authors
Jingyuan Shi
[@Jingyuan](https://github.com/Jingy2000)
提供机构:
Jingy2000
原始信息汇总
Multi-turn Counsel chat dataset 概述
数据集基本信息
- 许可证: MIT
- 语言: 英语
- 大小分类: 小于1K
数据集描述
- 数据集来源: CounselChat.com
- 数据集转换: 使用 gpt-4-0125-preview 将 CounselChat.com 论坛中的问题和答案转换为多轮对话数据。
数据集结构
- 文件: 包含两个文件
- all_dialogue_cleaned.json
- questionText: 用户向咨询师提出的问题内容
- answerText: 咨询师对问题的回答
- messages: 用户和咨询师之间的消息列表
- role: 角色(用户或咨询师)
- content: 消息内容
- all_dialogue_llama_json
- messages: 消息列表
- role: 角色(用户或咨询师)
- content: 消息内容
- messages: 消息列表
- all_dialogue_cleaned.json
数据集创建
- 转换工具: gpt-4-0125-preview
- 目的: 缺乏高质量的开源心理健康数据,此数据集旨在填补这一空缺,提供咨询师与需要帮助的患者之间互动的额外数据。
敏感信息
- 信息未匿名化: 数据集中可能包含个人的姓名。
偏差、风险和限制
- 生成偏差: 数据集由 gpt-4-0125-preview 生成,与真实咨询情况可能存在差异。
数据集作者
- 作者: Jingyuan Shi
- GitHub: @Jingy2000



