iqbalpurba26/health-topic-dataset
收藏Hugging Face2026-03-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/iqbalpurba26/health-topic-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- "id"
pretty_name: "Health Forum Question Dataset"
tags:
- text-classification
- topic-classification
license: "apache-2.0"
task_categories:
- text-classification
---
# 🩺 Health Topic Question Dataset (Multilingual)
**Indonesia 🇮🇩 | Health Question Classification**
This dataset contains health-related questions collected from Indonesian online health forums. It is available in **CSV format** and can be used for training models for health topic classification.
---
## 🔍 Dataset Overview
- **Languages**: Indonesian
- **Domain**: Health forum questions / informal health text
- **Format**: CSV (`.csv`)
- **Topics / Labels**:
| Label | Description |
|-------|------------|
| 0 | Allergy |
| 1 | Medication |
| 2 | Menstruation |
- **Source**: Indonesian online health forums
- **Collection Date**: Last sample collected in 2026
---
## 💾 CSV Structure
- **Columns**:
- `text`: Health-related question or text (string)
- `label`: Health topic label (integer, 0–2)
- **Example Row**:
```csv
text,label
"Kenapa tangan saya terasa panas setelah memasak?",0
```
## 📊 Intended Use Cases
- Training models for health topic classification
- Analyzing health questions on forums and social media
- Health content moderation systems
- Digital health assistants or chatbots
- Multilingual NLP pipelines for informal health text
---
## ⚠️ Limitations
- The dataset only supports the defined labels: `["Allergy", "Medication", "Menstruation"]`
- Not optimized for:
- Formal medical records
- Very short or ambiguous questions
- Highly code-mixed or complex language
- Dataset may contain biases from the source forums
---
## ⚖️ Ethical Considerations
- Data comes from public forums; consider user privacy
- Not intended to replace professional medical advice
- Human-in-the-loop is recommended for sensitive applications
---
## 💻 How to Load
To load the CSV dataset locally:
```python
from datasets import load_dataset
dataset = load_dataset("iqbalpurba26/health-topic-dataset")
print(dataset['train'][0])
```
## 📜 License
Released under the **Apache 2.0 License**.
Free for research and commercial use.
---
## 📚 Citation
```bibtex
@misc{iqbalpurba262026healthdataset,
author = {M. Iqbal Purba},
title = {Multilingual Health Topic Question Dataset},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/iqbalpurba26/health-topic-dataset}
}
```
---
语言:
- "印尼语(id)"
展示名称:"健康论坛问题数据集"
标签:
- 文本分类
- 主题分类
许可证:"Apache-2.0许可证"
任务类别:
- 文本分类
---
# 🩺 健康主题问答数据集(多语言版)
**印尼语 🇮🇩 | 健康问答分类**
本数据集收录自印尼语在线健康论坛的健康相关提问,以**CSV格式**存储,可用于训练健康主题分类模型。
---
## 🔍 数据集概览
- **语言**:印尼语
- **领域**:健康论坛问答 / 非正式健康文本
- **格式**:CSV(`.csv`)
- **主题/标签**:
| 标签编号 | 主题描述 |
|-------|------------|
| 0 | 过敏 |
| 1 | 用药 |
| 2 | 月经 |
- **数据来源**:印尼语在线健康论坛
- **采集时间**:最后一条样本采集于2026年
---
## 💾 CSV文件结构
- **字段说明**:
- `text`:健康相关提问或文本(字符串类型)
- `label`:健康主题标签(整数类型,取值范围0–2)
- **示例行**:
csv
text,label
"Kenapa tangan saya terasa panas setelah memasak?",0
## 📊 预期应用场景
- 训练健康主题分类模型
- 分析论坛与社交媒体上的健康提问
- 健康内容审核系统
- 数字健康助手或聊天机器人
- 面向非正式健康文本的多语言自然语言处理流水线
---
## ⚠️ 数据集局限性
- 本数据集仅支持预设标签:`["过敏", "用药", "月经"]`
- 未针对以下场景优化:
- 正式医疗记录
- 极短或歧义性提问
- 高度语码混合或复杂语言
- 数据集可能存在来源论坛带来的偏差
---
## ⚖️ 伦理考量
- 数据源自公开论坛,请考虑用户隐私问题
- 本数据集不可替代专业医疗建议
- 敏感应用场景建议采用人机协同流程
---
## 💻 数据集加载方法
本地加载CSV数据集的代码示例:
python
from datasets import load_dataset
dataset = load_dataset("iqbalpurba26/health-topic-dataset")
print(dataset['train'][0])
## 📜 许可证
本数据集采用 **Apache 2.0许可证** 发布,可免费用于研究与商业用途。
---
## 📚 引用格式
bibtex
@misc{iqbalpurba262026healthdataset,
author = {M. Iqbal Purba},
title = {Multilingual Health Topic Question Dataset},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/iqbalpurba26/health-topic-dataset}
}
提供机构:
iqbalpurba26



