five

BengaliEmpatheticConversationsCorpus : A Comprehensive Bengali Language Dataset for Mental Health Counseling

收藏
Mendeley Data2024-01-31 更新2024-06-27 收录
下载链接:
https://data.mendeley.com/datasets/b3j5tkswm7
下载链接
链接失效反馈
官方服务:
资源简介:
This is a corpus of empathetic conversations between counselors and patients. The dataset consists of 38,235 query answers with two more features such as topic, and question title. In a line, the corpus consists of 4 columns and 38,235 rows. The name of the columns are- 1. "Topic", 2. "Question-Title", 3. "Question" and 4. "Answers". As there are no datasets available of emphatic response in Bengali language, we have created the dataset from some other corpus in other languages such as English, Arabic named counsel-chat and arabic-empathetic-conversations (links given in the related links section) and more importantly, we have generated some more conversations from real counselling conversations from various sources. The selected corpus are publicly available. We have translated the corpus into Bengali and processed the corpus manually into usable such as by removing HTML tags, unusable characters and many more and finally created a noble dataset. According to the author, this is the first and largest corpus of emphatic conversation in Bengali, though it has some limitation. This will help researchers to do research in Bengali text more specifically in researching in mental health.

本数据集为咨询师与患者之间的共情对话语料库。该数据集包含38235组问答对,额外带有主题、问题标题两项特征。该语料库共包含4列、38235行数据,各列名称依次为:1. "Topic",2. "Question-Title",3. "Question",4. "Answers"。鉴于目前尚无孟加拉语共情回复数据集,我们从其他语言的现有语料库中构建了本数据集,这些语料库包括英文的counsel-chat、阿拉伯文的arabic-empathetic-conversations(相关链接见相关链接板块);更重要的是,我们还从多渠道的真实咨询对话中生成了额外的对话数据。所选语料库均为公开可用资源。我们将该语料库翻译成孟加拉语,并通过手动预处理使其具备可用性,例如去除HTML标签、无效字符等,最终构建得到这一全新数据集。据本数据集作者所述,这是目前首个且规模最大的孟加拉语共情对话语料库,尽管仍存在一定局限性。该数据集将助力孟加拉语文本相关研究,尤其是心理健康领域的相关研究工作。
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作