lblommesteyn/tigrinya-medical-healthcare-corpus

Name: lblommesteyn/tigrinya-medical-healthcare-corpus
Creator: lblommesteyn
Published: 2026-04-24 10:23:22
License: 暂无描述

Hugging Face2026-04-24 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/lblommesteyn/tigrinya-medical-healthcare-corpus

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含**1000条高质量的提格里尼亚语医疗和健康内容记录**，旨在解决提格里尼亚语在医疗信息获取方面的不足。提格里尼亚语（ISO 639-3: tir）在厄立特里亚和埃塞俄比亚有约700万使用者，但医疗材料严重匮乏。该数据集是HuggingFace Hub上**首批提格里尼亚语医疗/健康语料库**之一，适用于医疗翻译、健康教育和临床NLP开发等用途。数据集结构包括提格里尼亚语文本、英语翻译、领域（医疗）、子领域（具体医疗主题）、语言代码等字段。

This dataset contains **1,000 high-quality records** documenting **Tigrinya-language healthcare and medical content**. It pairs an underserved language with an underserved industry: **healthcare/medical information access**. Tigrinya (ISO 639-3: tir) is spoken by ~7 million people in Eritrea and Ethiopia, but healthcare materials in Tigrinya are severely lacking. This is one of the **first Tigrinya medical/healthcare corpora** on the HuggingFace Hub, suitable for medical translation, health education, and clinical NLP development. The dataset structure includes fields for Tigrinya text, English translation, domain (healthcare), subdomain (specific healthcare topic), language codes, etc.

提供机构：

lblommesteyn