IDinsight/urgency_detection_maternal_health_synthetic

Name: IDinsight/urgency_detection_maternal_health_synthetic
Creator: IDinsight
Published: 2024-08-22 18:50:43
License: 暂无描述

Hugging Face2024-08-22 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/IDinsight/urgency_detection_maternal_health_synthetic

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - text-classification - question-answering - zero-shot-classification - text-generation language: - en tags: - urgency detection - maternal health size_categories: - 10K<n<100K source_datasets: - original pretty_name: Urgency Detection for Maternal Health --- # Maternal Health Messages for Urgency Detection This dataset contains ~12.8k synthetic user messages related to [maternal health issues](https://saferbirth.org/aim-resources/aim-cornerstones/urgent-maternal-warning-signs-2/) generated using [Gemini-1.5-Flash](https://deepmind.google/technologies/gemini/flash/) and verified using [Gemini-1.5-Pro](https://deepmind.google/technologies/gemini/pro/). This dataset can be used to train models for detecting urgent messges related to maternal health. One such model is the [gemma-2-2b-it-ud](https://huggingface.co/IDinsight/gemma-2-2b-it-ud) model. The prompts used to generate the dataset are contained in the `prompts_for_generating_user_messages.py` module. ## Motivation Messages related to maternal health often contain sensitive/PII information. Thus, it is not always possible or preferable to send such data to APIs such as OpenAI models for analyses. In addition, maternal health messages from developing countries are difficult to obtain at scale, making training custom models difficult. To address both of these concerns, we created a synthetic dataset of user messages related to maternal health. This was accomplished by analyzing ~4k messages collected by [MomConnect](https://www.health.gov.za/momconnect/) in order to create a pool of user personas and message characteristics (e.g., average length of message, use of slangs/emojis, etc.). We then randomly sampled user personas and message characteristics and asked Gemini-1.5-Flash to generate a user message that aligns with a randomly chosen maternal health issue. Gemini-1.5-Pro was then used to verify that the generated message matches with the selected health issue. If a generated messages fails the verification check, it is excluded from the final dataset.

提供机构：

IDinsight

5,000+

优质数据集

54 个

任务类型

进入经典数据集