five

LuangMV97/Empathetic_counseling_Dataset

收藏
Hugging Face2024-03-30 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/LuangMV97/Empathetic_counseling_Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: input dtype: string - name: label dtype: string splits: - name: train num_bytes: 9143613.730886951 num_examples: 30937 - name: test num_bytes: 4445059.587284399 num_examples: 7736 download_size: 10363922 dataset_size: 13588673.31817135 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* task_categories: - text-generation tags: - medical --- # Dataset Card for Dataset Name Empathetic_counseling is a dataset intended for training conversational language models for generating text in empathetic and mental counseling dialogues. This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1). ## Dataset Details ### Dataset Description This is a dataset resulting after concatenating some examples from the "empathetic_dialogues" dataset with a dataset resulting from the combination between "Amod/mental_health_counseling_conversations", "EmoCareAI/Psych8k" and "https://github.com/nbertagnolli/counsel-chat.git". It is composed of "input" and "label" columns, where the first one is a user utterance and the second one is the response the model is expected to predict. It sought to adapt a set of examples with an input about situations that a person is experiencing for a given emotion and its respective output which is the empathic or counseling response. - **Language(s) (NLP):** English - **License:** [More Information Needed] ## Uses Empathetic_counseling is a dataset intended for training conversational language models for text-generation task in empathetic and mental counseling dialogues. ### Direct Use Use cases: - Chatbot - Virtual assistant. - Emotional counseling conversations. ## Dataset Structure The dataset has 38673 rows, divided into 80% for "train" (30937) and 20% for "test" (7736). The number of examples for each subset is described as follows: - empathetic_dialogues: train: 19880, test: 4970. - Amod/mental_health_counseling_conversations: train: 2805, test: 702. - EmoCareAI/Psych8k: train: 6549, test: 1638. - nbertagnolli/counsel-chat (GitHub repository): train: 1703, test: 426. ## Dataset Creation ### Curation Rationale The motivation for creating the dataset was to train an encoder-decoder model, taking FacebookAI/roberta-base as encoder and microsoft/DialoGPT-medium as decoder, serving as the language model for the text-generation task of a master's final project. #### Data Collection and Processing A preprocessing was performed by eliminating unnecessary columns and missing values. The purpose of not taking the complete EmpatheticDialogues dataset is to have a better balance in the number of rows with the rest of the resulting dataset; the number of examples mentioned in their original paper was taken. **APA:** [More Information Needed] ## Dataset Card Authors [optional] The Dataset author is Luis Angel Motta Valero, VIU student. ## Dataset Card Contact For more information and contact: luisangel.motta@alumnos.viu.es or luchomotta97@gmail.com
提供机构:
LuangMV97
原始信息汇总

数据集概述

数据集名称

Empathetic_counseling

数据集描述

Empathetic_counseling 是一个用于训练对话语言模型的数据集,旨在生成情感和心理咨询对话中的文本。该数据集由多个子数据集合并而成,包括 "empathetic_dialogues"、"Amod/mental_health_counseling_conversations"、"EmoCareAI/Psych8k" 和 "https://github.com/nbertagnolli/counsel-chat.git"。

数据集特征

  • input: 用户发言,数据类型为字符串。
  • label: 模型预期预测的响应,数据类型为字符串。

数据集结构

  • 训练集 (train): 包含30937个样本,占用9143613.730886951字节。
  • 测试集 (test): 包含7736个样本,占用4445059.587284399字节。

数据集大小

  • 下载大小: 10363922字节。
  • 数据集总大小: 13588673.31817135字节。

数据集配置

  • 默认配置 (default):
    • 训练数据路径: data/train-*
    • 测试数据路径: data/test-*

任务类别

  • 文本生成

标签

  • 医疗

语言

  • 英语

许可证

  • [更多信息需要]
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作