five

openmed-community/TheBlueScrubs-v1-fixed

收藏
Hugging Face2025-08-29 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/openmed-community/TheBlueScrubs-v1-fixed
下载链接
链接失效反馈
官方服务:
资源简介:
TheBlueScrubs-v1-fixed是一个修复了meta列架构问题的TheBlueScrubs-v1数据集的维护分支。该数据集不包含meta列,但保留了文本字段和值,包含了从SlimPajama/RedPajama来源精选的医疗文本。经过逻辑回归筛选和Llama-3.1-70B评估,该数据集被设计用于临床语言模型的训练,具有高质量信号,包括大约11亿个标记和11,080,331个训练文本。

TheBlueScrubs-v1-fixed is a maintenance fork of the original TheBlueScrubs-v1 dataset, fixing the schema issue in the meta column. It does not include the meta column but preserves the text field and values, containing curated medical text from SlimPajama/RedPajama sources. Filtered through logistic regression and evaluated with Llama-3.1-70B, this dataset is designed for training clinical language models with high-quality signals, including approximately 1 billion tokens and 11,080,331 training texts.
提供机构:
openmed-community
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作