LuangMV97/Empathetic_counseling_Dataset
收藏Hugging Face2024-03-30 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/LuangMV97/Empathetic_counseling_Dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: input
dtype: string
- name: label
dtype: string
splits:
- name: train
num_bytes: 9143613.730886951
num_examples: 30937
- name: test
num_bytes: 4445059.587284399
num_examples: 7736
download_size: 10363922
dataset_size: 13588673.31817135
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
task_categories:
- text-generation
tags:
- medical
---
# Dataset Card for Dataset Name
Empathetic_counseling is a dataset intended for training conversational language models for generating text in empathetic and mental counseling dialogues.
This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1).
## Dataset Details
### Dataset Description
This is a dataset resulting after concatenating some examples from the "empathetic_dialogues" dataset with a dataset resulting from the combination between "Amod/mental_health_counseling_conversations", "EmoCareAI/Psych8k" and "https://github.com/nbertagnolli/counsel-chat.git".
It is composed of "input" and "label" columns, where the first one is a user utterance and the second one is the response the model is expected to predict. It sought to adapt a set of examples with an input about situations that a person is experiencing for a given emotion and its respective output which is the empathic or counseling response.
- **Language(s) (NLP):** English
- **License:** [More Information Needed]
## Uses
Empathetic_counseling is a dataset intended for training conversational language models for text-generation task in empathetic and mental counseling dialogues.
### Direct Use
Use cases:
- Chatbot
- Virtual assistant.
- Emotional counseling conversations.
## Dataset Structure
The dataset has 38673 rows, divided into 80% for "train" (30937) and 20% for "test" (7736). The number of examples for each subset is described as follows:
- empathetic_dialogues: train: 19880, test: 4970.
- Amod/mental_health_counseling_conversations: train: 2805, test: 702.
- EmoCareAI/Psych8k: train: 6549, test: 1638.
- nbertagnolli/counsel-chat (GitHub repository): train: 1703, test: 426.
## Dataset Creation
### Curation Rationale
The motivation for creating the dataset was to train an encoder-decoder model, taking FacebookAI/roberta-base as encoder and microsoft/DialoGPT-medium as decoder, serving as the language model for the text-generation task of a master's final project.
#### Data Collection and Processing
A preprocessing was performed by eliminating unnecessary columns and missing values.
The purpose of not taking the complete EmpatheticDialogues dataset is to have a better balance in the number of rows with the rest of the resulting dataset; the number of examples mentioned in their original paper was taken.
**APA:**
[More Information Needed]
## Dataset Card Authors [optional]
The Dataset author is Luis Angel Motta Valero, VIU student.
## Dataset Card Contact
For more information and contact: luisangel.motta@alumnos.viu.es or luchomotta97@gmail.com
提供机构:
LuangMV97
原始信息汇总
数据集概述
数据集名称
Empathetic_counseling
数据集描述
Empathetic_counseling 是一个用于训练对话语言模型的数据集,旨在生成情感和心理咨询对话中的文本。该数据集由多个子数据集合并而成,包括 "empathetic_dialogues"、"Amod/mental_health_counseling_conversations"、"EmoCareAI/Psych8k" 和 "https://github.com/nbertagnolli/counsel-chat.git"。
数据集特征
- input: 用户发言,数据类型为字符串。
- label: 模型预期预测的响应,数据类型为字符串。
数据集结构
- 训练集 (train): 包含30937个样本,占用9143613.730886951字节。
- 测试集 (test): 包含7736个样本,占用4445059.587284399字节。
数据集大小
- 下载大小: 10363922字节。
- 数据集总大小: 13588673.31817135字节。
数据集配置
- 默认配置 (default):
- 训练数据路径:
data/train-* - 测试数据路径:
data/test-*
- 训练数据路径:
任务类别
- 文本生成
标签
- 医疗
语言
- 英语
许可证
- [更多信息需要]



