LuangMV97/Empathetic_counseling_Dataset

Name: LuangMV97/Empathetic_counseling_Dataset
Creator: LuangMV97
Published: 2024-03-30 02:17:42
License: 暂无描述

Hugging Face2024-03-30 更新2024-05-25 收录

下载链接：

https://hf-mirror.com/datasets/LuangMV97/Empathetic_counseling_Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: input dtype: string - name: label dtype: string splits: - name: train num_bytes: 9143613.730886951 num_examples: 30937 - name: test num_bytes: 4445059.587284399 num_examples: 7736 download_size: 10363922 dataset_size: 13588673.31817135 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* task_categories: - text-generation tags: - medical --- # Dataset Card for Dataset Name Empathetic_counseling is a dataset intended for training conversational language models for generating text in empathetic and mental counseling dialogues. This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1). ## Dataset Details ### Dataset Description This is a dataset resulting after concatenating some examples from the "empathetic_dialogues" dataset with a dataset resulting from the combination between "Amod/mental_health_counseling_conversations", "EmoCareAI/Psych8k" and "https://github.com/nbertagnolli/counsel-chat.git". It is composed of "input" and "label" columns, where the first one is a user utterance and the second one is the response the model is expected to predict. It sought to adapt a set of examples with an input about situations that a person is experiencing for a given emotion and its respective output which is the empathic or counseling response. - **Language(s) (NLP):** English - **License:** [More Information Needed] ## Uses Empathetic_counseling is a dataset intended for training conversational language models for text-generation task in empathetic and mental counseling dialogues. ### Direct Use Use cases: - Chatbot - Virtual assistant. - Emotional counseling conversations. ## Dataset Structure The dataset has 38673 rows, divided into 80% for "train" (30937) and 20% for "test" (7736). The number of examples for each subset is described as follows: - empathetic_dialogues: train: 19880, test: 4970. - Amod/mental_health_counseling_conversations: train: 2805, test: 702. - EmoCareAI/Psych8k: train: 6549, test: 1638. - nbertagnolli/counsel-chat (GitHub repository): train: 1703, test: 426. ## Dataset Creation ### Curation Rationale The motivation for creating the dataset was to train an encoder-decoder model, taking FacebookAI/roberta-base as encoder and microsoft/DialoGPT-medium as decoder, serving as the language model for the text-generation task of a master's final project. #### Data Collection and Processing A preprocessing was performed by eliminating unnecessary columns and missing values. The purpose of not taking the complete EmpatheticDialogues dataset is to have a better balance in the number of rows with the rest of the resulting dataset; the number of examples mentioned in their original paper was taken. **APA:** [More Information Needed] ## Dataset Card Authors [optional] The Dataset author is Luis Angel Motta Valero, VIU student. ## Dataset Card Contact For more information and contact: luisangel.motta@alumnos.viu.es or luchomotta97@gmail.com

提供机构：

LuangMV97

原始信息汇总

数据集概述

数据集名称

Empathetic_counseling

数据集描述

Empathetic_counseling 是一个用于训练对话语言模型的数据集，旨在生成情感和心理咨询对话中的文本。该数据集由多个子数据集合并而成，包括 "empathetic_dialogues"、"Amod/mental_health_counseling_conversations"、"EmoCareAI/Psych8k" 和 "https://github.com/nbertagnolli/counsel-chat.git"。

数据集特征

input: 用户发言，数据类型为字符串。
label: 模型预期预测的响应，数据类型为字符串。

数据集结构

训练集 (train): 包含30937个样本，占用9143613.730886951字节。
测试集 (test): 包含7736个样本，占用4445059.587284399字节。

数据集大小

下载大小: 10363922字节。
数据集总大小: 13588673.31817135字节。

数据集配置

默认配置 (default):
- 训练数据路径: data/train-*
- 测试数据路径: data/test-*

任务类别

文本生成

语言

英语

许可证

[更多信息需要]

5,000+

优质数据集

54 个

任务类型

进入经典数据集