hashan-7/medscript-drug-dataset
收藏Hugging Face2026-04-14 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/hashan-7/medscript-drug-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
tags:
- healthcare
- nlp
- ocr
size_categories:
- 10K<n<100K
---
# MedScript Drug Dataset
## Description
This dataset contains cleaned and consolidated medicine names collected from multiple sources. It is designed for use in prescription OCR post-processing and drug name matching.
## Structure
- drug_name: normalized medicine name (lowercase, cleaned)
## Sources
- product dataset
- drug_database.csv
- massive_drug_list.csv
- sl_drugs dataset
## Processing
- converted to lowercase
- trimmed whitespace
- removed duplicates
- removed invalid entries
## Use Case
- OCR correction
- fuzzy matching
- medical text normalization
提供机构:
hashan-7



