saramscruz/pt-health-text-complexity
收藏Hugging Face2025-12-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/saramscruz/pt-health-text-complexity
下载链接
链接失效反馈官方服务:
资源简介:
葡萄牙健康文本复杂性数据集(PT-PT)是一个用于医疗保健领域文本复杂性分类的精选数据集,专注于欧洲葡萄牙语。它结合了来自SNS 24的面向公民的健康通信和来自Direção-Geral da Saúde (DGS)的专业临床语言,使模型能够学习区分清晰、中等和复杂的健康相关文本。数据集支持文本分类、文本复杂性评估、健康通信分析、可读性和可访问性研究以及多语言转移学习(PT → EN/ES/FR)。数据集以JSONL格式提供,包含唯一标识符、语言代码、领域、数据源、原始文本样本和复杂性标签(清晰、中等、复杂)。所有文本均遵循欧洲葡萄牙语惯例。
The Portuguese Health Text Complexity Dataset (PT-PT) is a curated dataset for text complexity classification in healthcare, focused on European Portuguese. It combines citizen-facing health communication from SNS 24 and professional clinical language from Direção-Geral da Saúde (DGS), allowing models to learn the distinction between clear, medium, and complex health-related texts. The dataset supports text classification, text complexity assessment, health communication analysis, readability and accessibility research, and multilingual transfer learning (PT → EN/ES/FR). It is provided as a JSONL file with fields including unique identifier, language code, domain, data source, raw text sample, and complexity label (clear, medium, complex). All texts follow European Portuguese conventions.
提供机构:
saramscruz



