rntc/biomed-fr-v3-enriched-softmin-standard
收藏Hugging Face2025-10-06 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/rntc/biomed-fr-v3-enriched-softmin-standard
下载链接
链接失效反馈官方服务:
资源简介:
biomed-fr-v3-enriched-softmin-standard数据集是一个使用soft-min瓶颈采样方法对rntc/biomed-fr-v3-enriched数据集进行质量上采样的版本,包含2941107个样本。该数据集针对医学和生物医学领域,采用法语,并且经过质量过滤。数据集的预处理包括软最小值计算、权重计算和基于权重的重采样。在质量评分方面,使用了教育性评分、内容丰富性、术语精确性和写作质量四个指标,并排除了缺失评分的样本。
biomed-fr-v3-enriched-softmin-standard dataset is a quality-upsampled version of the rntc/biomed-fr-v3-enriched dataset using soft-min bottleneck sampling, containing 2941107 samples. Targeted at the medical and biomedical fields, this dataset is in French and has undergone quality filtering. The preprocessing of the dataset includes soft-min calculation, weight computation, and resampling based on weights. In terms of quality scoring, four indicators are used: educational score, content richness, terminology precision, and writing quality, with samples missing scores being excluded.
提供机构:
rntc



