five

saramscruz/pt-health-text-complexity

收藏
Hugging Face2025-12-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/saramscruz/pt-health-text-complexity
下载链接
链接失效反馈
官方服务:
资源简介:
葡萄牙健康文本复杂性数据集(PT-PT)是一个用于医疗保健领域文本复杂性分类的精选数据集,专注于欧洲葡萄牙语。它结合了来自SNS 24的面向公民的健康通信和来自Direção-Geral da Saúde (DGS)的专业临床语言,使模型能够学习区分清晰、中等和复杂的健康相关文本。数据集支持文本分类、文本复杂性评估、健康通信分析、可读性和可访问性研究以及多语言转移学习(PT → EN/ES/FR)。数据集以JSONL格式提供,包含唯一标识符、语言代码、领域、数据源、原始文本样本和复杂性标签(清晰、中等、复杂)。所有文本均遵循欧洲葡萄牙语惯例。

The Portuguese Health Text Complexity Dataset (PT-PT) is a curated dataset for text complexity classification in healthcare, focused on European Portuguese. It combines citizen-facing health communication from SNS 24 and professional clinical language from Direção-Geral da Saúde (DGS), allowing models to learn the distinction between clear, medium, and complex health-related texts. The dataset supports text classification, text complexity assessment, health communication analysis, readability and accessibility research, and multilingual transfer learning (PT → EN/ES/FR). It is provided as a JSONL file with fields including unique identifier, language code, domain, data source, raw text sample, and complexity label (clear, medium, complex). All texts follow European Portuguese conventions.
提供机构:
saramscruz
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作