rntc/bb-tt-3-pretrain
收藏Hugging Face2025-09-25 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/rntc/bb-tt-3-pretrain
下载链接
链接失效反馈官方服务:
资源简介:
Biomed-FR-v3 高质量预训练数据集包含法语生物医药文本,使用 rntc/biomed-fr-v2-classifier 模型进行了20个不同分类和回归任务的注释。数据集共有2,782,686个样本,41个列,涵盖25个注释任务,专注于生物医药/临床领域。数据集经过过滤,以确保94.6%的数据适合预训练。它包括完整的注释覆盖范围、质量度量、临床重点和正确的列顺序等关键特性。
The Biomed-FR-v3 High-Quality Pretraining Dataset contains French biomedical text annotated with 20 different classification and regression tasks using the `rntc/biomed-fr-v2-classifier` model. The dataset comprises 2,782,686 samples across 41 columns, covering 25 annotation tasks, focusing on the biomedical and clinical domain. The data has been filtered to ensure 94.6% of it is suitable for pretraining. Key features include complete annotation coverage, quality metrics, clinical focus, and proper column order.
提供机构:
rntc



