openmed-community/TheBlueScrubs-v1-fixed
收藏Hugging Face2025-08-29 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/openmed-community/TheBlueScrubs-v1-fixed
下载链接
链接失效反馈官方服务:
资源简介:
TheBlueScrubs-v1-fixed是一个修复了meta列架构问题的TheBlueScrubs-v1数据集的维护分支。该数据集不包含meta列,但保留了文本字段和值,包含了从SlimPajama/RedPajama来源精选的医疗文本。经过逻辑回归筛选和Llama-3.1-70B评估,该数据集被设计用于临床语言模型的训练,具有高质量信号,包括大约11亿个标记和11,080,331个训练文本。
TheBlueScrubs-v1-fixed is a maintenance fork of the original TheBlueScrubs-v1 dataset, fixing the schema issue in the meta column. It does not include the meta column but preserves the text field and values, containing curated medical text from SlimPajama/RedPajama sources. Filtered through logistic regression and evaluated with Llama-3.1-70B, this dataset is designed for training clinical language models with high-quality signals, including approximately 1 billion tokens and 11,080,331 training texts.
提供机构:
openmed-community



