UMCU/DutchMedicalTextV2
收藏Hugging Face2026-04-29 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/UMCU/DutchMedicalTextV2
下载链接
链接失效反馈官方服务:
资源简介:
DutchMedicalText v2数据集是一个混合了PMC/Pubmed、Apollo/Meditron现有数据集以及各种荷兰来源的翻译文本的集合。由于使用了神经机器翻译,可能存在重复内容的问题,但提供了清理文本的代码示例。该数据集主要用于文本生成和填充掩码任务,语言为荷兰语,领域为医学。数据集的大小在10M到100M之间,许可证为gpl-3.0。
DutchMedicalText v2 is a translated mix of PMC/Pubmed, existing data sets from Apollo/Meditron and various Dutch sources. Be aware of spurious repetitions due to the use of neural machine translation, with provided code examples for text cleaning. The dataset is primarily used for text-generation and fill-mask tasks, in the Dutch language and medical domain. The size category is between 10M and 100M, licensed under gpl-3.0.
提供机构:
UMCU



