EssentialAI/eai-taxonomy-med-w-dclm-100b-sample
收藏Hugging Face2025-06-22 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/EssentialAI/eai-taxonomy-med-w-dclm-100b-sample
下载链接
链接失效反馈官方服务:
资源简介:
这是一个高质量的医疗数据集,通过基于分类法的过滤从网络数据中获取,包含1000亿个标记的医疗内容。数据集通过使用一个12分类的分类法来有效地识别和提取高质量的医疗内容。该数据集在多个医学评估中取得了最佳或接近最佳的结果,并在MedQA-USMLE中成功超过了随机性能。
A high-quality medical dataset curated from web data using taxonomy-based filtering, containing 100 billion tokens of medical content. This dataset leverages a 12-category taxonomy to efficiently identify and extract high-quality medical content. The dataset achieves best or near-best performance across all medical evaluations and successfully performs above chance on MedQA-USMLE.
提供机构:
EssentialAI



