IDinsight/urgency_detection_maternal_health_synthetic
收藏Hugging Face2024-08-22 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/IDinsight/urgency_detection_maternal_health_synthetic
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-classification
- question-answering
- zero-shot-classification
- text-generation
language:
- en
tags:
- urgency detection
- maternal health
size_categories:
- 10K<n<100K
source_datasets:
- original
pretty_name: Urgency Detection for Maternal Health
---
# Maternal Health Messages for Urgency Detection
This dataset contains ~12.8k synthetic user messages related to
[maternal health issues](https://saferbirth.org/aim-resources/aim-cornerstones/urgent-maternal-warning-signs-2/) generated using
[Gemini-1.5-Flash](https://deepmind.google/technologies/gemini/flash/) and verified using
[Gemini-1.5-Pro](https://deepmind.google/technologies/gemini/pro/).
This dataset can be used to train models for detecting urgent messges related to maternal health. One such model is the
[gemma-2-2b-it-ud](https://huggingface.co/IDinsight/gemma-2-2b-it-ud) model.
The prompts used to generate the dataset are contained in the `prompts_for_generating_user_messages.py` module.
## Motivation
Messages related to maternal health often contain sensitive/PII information. Thus, it is not always possible or preferable to send such data
to APIs such as OpenAI models for analyses. In addition, maternal health messages from developing countries are difficult to obtain at scale,
making training custom models difficult.
To address both of these concerns, we created a synthetic dataset of user messages related to maternal health. This was accomplished by
analyzing ~4k messages collected by [MomConnect](https://www.health.gov.za/momconnect/) in order to create a pool of user personas and message
characteristics (e.g., average length of message, use of slangs/emojis, etc.). We then randomly sampled user personas and message
characteristics and asked Gemini-1.5-Flash to generate a user message that aligns with a randomly chosen maternal health issue. Gemini-1.5-Pro
was then used to verify that the generated message matches with the selected health issue. If a generated messages fails the verification
check, it is excluded from the final dataset.
提供机构:
IDinsight



