rntc/biomed-fr-v3-enriched-softmin-tres_agressif_min2
收藏Hugging Face2025-10-06 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/rntc/biomed-fr-v3-enriched-softmin-tres_agressif_min2
下载链接
链接失效反馈官方服务:
资源简介:
biomed-fr-v3-enriched-softmin-tres_agressif_min2数据集是一个使用soft-min瓶颈采样方法对rntc/biomed-fr-v3-enriched数据集进行质量上采样后的版本。在预处理过程中,移除了质量评分低于2.0的样本,并使用特定的公式计算权重后进行重采样,保留了原始数据集的2941107个样本量。该数据集适用于文本生成任务,主要包含医学和生物医学领域的法语文本,并且经过了质量过滤。
The biomed-fr-v3-enriched-softmin-tres_agressif_min2 dataset is a quality-upsampled version of the rntc/biomed-fr-v3-enriched using soft-min bottleneck sampling. During preprocessing, samples with a quality score below 2.0 were removed, and after calculating the weights with a specific formula, resampling was performed to maintain the original dataset size of 2,941,107 samples. This dataset is suitable for text generation tasks and primarily contains French texts in the medical and biomedical fields, which have been quality filtered.
提供机构:
rntc



