five

iqbalpurba26/health-topic-dataset

收藏
Hugging Face2026-03-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/iqbalpurba26/health-topic-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - "id" pretty_name: "Health Forum Question Dataset" tags: - text-classification - topic-classification license: "apache-2.0" task_categories: - text-classification --- # 🩺 Health Topic Question Dataset (Multilingual) **Indonesia 🇮🇩 | Health Question Classification** This dataset contains health-related questions collected from Indonesian online health forums. It is available in **CSV format** and can be used for training models for health topic classification. --- ## 🔍 Dataset Overview - **Languages**: Indonesian - **Domain**: Health forum questions / informal health text - **Format**: CSV (`.csv`) - **Topics / Labels**: | Label | Description | |-------|------------| | 0 | Allergy | | 1 | Medication | | 2 | Menstruation | - **Source**: Indonesian online health forums - **Collection Date**: Last sample collected in 2026 --- ## 💾 CSV Structure - **Columns**: - `text`: Health-related question or text (string) - `label`: Health topic label (integer, 0–2) - **Example Row**: ```csv text,label "Kenapa tangan saya terasa panas setelah memasak?",0 ``` ## 📊 Intended Use Cases - Training models for health topic classification - Analyzing health questions on forums and social media - Health content moderation systems - Digital health assistants or chatbots - Multilingual NLP pipelines for informal health text --- ## ⚠️ Limitations - The dataset only supports the defined labels: `["Allergy", "Medication", "Menstruation"]` - Not optimized for: - Formal medical records - Very short or ambiguous questions - Highly code-mixed or complex language - Dataset may contain biases from the source forums --- ## ⚖️ Ethical Considerations - Data comes from public forums; consider user privacy - Not intended to replace professional medical advice - Human-in-the-loop is recommended for sensitive applications --- ## 💻 How to Load To load the CSV dataset locally: ```python from datasets import load_dataset dataset = load_dataset("iqbalpurba26/health-topic-dataset") print(dataset['train'][0]) ``` ## 📜 License Released under the **Apache 2.0 License**. Free for research and commercial use. --- ## 📚 Citation ```bibtex @misc{iqbalpurba262026healthdataset, author = {M. Iqbal Purba}, title = {Multilingual Health Topic Question Dataset}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/iqbalpurba26/health-topic-dataset} } ```

--- 语言: - "印尼语(id)" 展示名称:"健康论坛问题数据集" 标签: - 文本分类 - 主题分类 许可证:"Apache-2.0许可证" 任务类别: - 文本分类 --- # 🩺 健康主题问答数据集(多语言版) **印尼语 🇮🇩 | 健康问答分类** 本数据集收录自印尼语在线健康论坛的健康相关提问,以**CSV格式**存储,可用于训练健康主题分类模型。 --- ## 🔍 数据集概览 - **语言**:印尼语 - **领域**:健康论坛问答 / 非正式健康文本 - **格式**:CSV(`.csv`) - **主题/标签**: | 标签编号 | 主题描述 | |-------|------------| | 0 | 过敏 | | 1 | 用药 | | 2 | 月经 | - **数据来源**:印尼语在线健康论坛 - **采集时间**:最后一条样本采集于2026年 --- ## 💾 CSV文件结构 - **字段说明**: - `text`:健康相关提问或文本(字符串类型) - `label`:健康主题标签(整数类型,取值范围0–2) - **示例行**: csv text,label "Kenapa tangan saya terasa panas setelah memasak?",0 ## 📊 预期应用场景 - 训练健康主题分类模型 - 分析论坛与社交媒体上的健康提问 - 健康内容审核系统 - 数字健康助手或聊天机器人 - 面向非正式健康文本的多语言自然语言处理流水线 --- ## ⚠️ 数据集局限性 - 本数据集仅支持预设标签:`["过敏", "用药", "月经"]` - 未针对以下场景优化: - 正式医疗记录 - 极短或歧义性提问 - 高度语码混合或复杂语言 - 数据集可能存在来源论坛带来的偏差 --- ## ⚖️ 伦理考量 - 数据源自公开论坛,请考虑用户隐私问题 - 本数据集不可替代专业医疗建议 - 敏感应用场景建议采用人机协同流程 --- ## 💻 数据集加载方法 本地加载CSV数据集的代码示例: python from datasets import load_dataset dataset = load_dataset("iqbalpurba26/health-topic-dataset") print(dataset['train'][0]) ## 📜 许可证 本数据集采用 **Apache 2.0许可证** 发布,可免费用于研究与商业用途。 --- ## 📚 引用格式 bibtex @misc{iqbalpurba262026healthdataset, author = {M. Iqbal Purba}, title = {Multilingual Health Topic Question Dataset}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/iqbalpurba26/health-topic-dataset} }
提供机构:
iqbalpurba26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作