Susastho adolescent Dataset_18.02
收藏DataCite Commons2025-02-18 更新2025-01-06 收录
下载链接:
https://figshare.com/articles/dataset/Susastho_adolescent_Dataset_24_09_24_xlsx/27100795
下载链接
链接失效反馈官方服务:
资源简介:
Significant challenges are present in the development of Bangladesh, with adolescents being the most vulnerable among them. Their vulnerability is partly a predisposition wherein prevailing attitudes are themselves wrapped in taboo and secrecy about sexual, reproductive, and mental health This cultural reluctance impedes open discussion, leaving adolescents without crucial knowledge for their well-being. To address this, there is a plan to develop a chatbot to meet adolescents' informational needs.To build this chatbot, a dataset specifically focused on adolescents has been collected in the domains of Sexual, Reproductive, and Mental Health (SRMH). The goal of this dataset is to develop a conversational agent in Bengali that will help address adolescents' knowledge gaps on these topics and serve as a bridge to enhance understanding within this group. The dataset is qualitative rather than quantitative as it needs to be a source of information and acts as a knowledge base for the chatbot as the chatbot needs to answer queries about SRMH in Bengali. We scrapped the data using manual methods and it had web, domain expert consultation, user consultation, and crowdsourcing as sources. Furthermore, the dataset was preprocessed by domain experts in a Q/A format and queries vs information format. Also, strict ethical guidelines were followed to curate the dataset and it was done with obtaining approval from the UIU (United International University) ethical approval committee.The dataset is divided into two key domains: Sexual & Reproductive Health and Mental Health. Sexual Reproductive Health consists of a total of 1966 data and Mental Health consists of 470 data. These data were augmented later for a total of 7278 data of which 5849 were of Sexual Reproductive Health. The percentage of raw data for Sexual & Reproductive and Mental Health is 80.7% to 19.3% respectively. Furthermore, the percentage of augmented data for SRMH is 80.4% to 19.6% respectively. This clearly shows that we have a skewed distribution and the dataset is imbalanced.<br>
提供机构:
figshare
创建时间:
2024-09-25



