five

vidulpanickan/TinyEHR

收藏
Hugging Face2026-04-03 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/vidulpanickan/TinyEHR
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: odbl language: - en size_categories: - 1M<n<10M multilinguality: - monolingual source_datasets: - physionet/mimic-iv-demo task_categories: - table-question-answering tags: - medical - clinical - ehr - mimic - omop - healthcare - agentic - clinical-notes - clinical-nlp - tabular pretty_name: TinyEHR configs: - config_name: mimic_admissions data_files: "tinyehr_mimic_format/admissions.parquet" - config_name: mimic_caregiver data_files: "tinyehr_mimic_format/caregiver.parquet" - config_name: mimic_chartevents data_files: "tinyehr_mimic_format/chartevents.parquet" - config_name: mimic_d_hcpcs data_files: "tinyehr_mimic_format/d_hcpcs.parquet" - config_name: mimic_d_icd_diagnoses data_files: "tinyehr_mimic_format/d_icd_diagnoses.parquet" - config_name: mimic_d_icd_procedures data_files: "tinyehr_mimic_format/d_icd_procedures.parquet" - config_name: mimic_d_items data_files: "tinyehr_mimic_format/d_items.parquet" - config_name: mimic_d_labitems data_files: "tinyehr_mimic_format/d_labitems.parquet" - config_name: mimic_date_offsets data_files: "tinyehr_mimic_format/date_offsets.parquet" - config_name: mimic_datetimeevents data_files: "tinyehr_mimic_format/datetimeevents.parquet" - config_name: mimic_diagnoses_icd data_files: "tinyehr_mimic_format/diagnoses_icd.parquet" - config_name: mimic_drgcodes data_files: "tinyehr_mimic_format/drgcodes.parquet" - config_name: mimic_emar data_files: "tinyehr_mimic_format/emar.parquet" - config_name: mimic_emar_detail data_files: "tinyehr_mimic_format/emar_detail.parquet" - config_name: mimic_hcpcsevents data_files: "tinyehr_mimic_format/hcpcsevents.parquet" - config_name: mimic_icustays data_files: "tinyehr_mimic_format/icustays.parquet" - config_name: mimic_ingredientevents data_files: "tinyehr_mimic_format/ingredientevents.parquet" - config_name: mimic_inputevents data_files: "tinyehr_mimic_format/inputevents.parquet" - config_name: mimic_labevents data_files: "tinyehr_mimic_format/labevents.parquet" - config_name: mimic_microbiologyevents data_files: "tinyehr_mimic_format/microbiologyevents.parquet" - config_name: mimic_noteevents data_files: "tinyehr_mimic_format/noteevents.parquet" - config_name: mimic_omr data_files: "tinyehr_mimic_format/omr.parquet" - config_name: mimic_outputevents data_files: "tinyehr_mimic_format/outputevents.parquet" - config_name: mimic_patients data_files: "tinyehr_mimic_format/patients.parquet" - config_name: mimic_pharmacy data_files: "tinyehr_mimic_format/pharmacy.parquet" - config_name: mimic_poe data_files: "tinyehr_mimic_format/poe.parquet" - config_name: mimic_poe_detail data_files: "tinyehr_mimic_format/poe_detail.parquet" - config_name: mimic_prescriptions data_files: "tinyehr_mimic_format/prescriptions.parquet" - config_name: mimic_procedureevents data_files: "tinyehr_mimic_format/procedureevents.parquet" - config_name: mimic_procedures_icd data_files: "tinyehr_mimic_format/procedures_icd.parquet" - config_name: mimic_provider data_files: "tinyehr_mimic_format/provider.parquet" - config_name: mimic_services data_files: "tinyehr_mimic_format/services.parquet" - config_name: mimic_transfers data_files: "tinyehr_mimic_format/transfers.parquet" - config_name: omop_2b_concept data_files: "tinyehr_omop_format/2b_concept.parquet" - config_name: omop_2b_concept_relationship data_files: "tinyehr_omop_format/2b_concept_relationship.parquet" - config_name: omop_2b_vocabulary data_files: "tinyehr_omop_format/2b_vocabulary.parquet" - config_name: omop_attribute_definition data_files: "tinyehr_omop_format/attribute_definition.parquet" - config_name: omop_care_site data_files: "tinyehr_omop_format/care_site.parquet" - config_name: omop_cdm_source data_files: "tinyehr_omop_format/cdm_source.parquet" - config_name: omop_cohort data_files: "tinyehr_omop_format/cohort.parquet" - config_name: omop_cohort_attribute data_files: "tinyehr_omop_format/cohort_attribute.parquet" - config_name: omop_cohort_definition data_files: "tinyehr_omop_format/cohort_definition.parquet" - config_name: omop_condition_era data_files: "tinyehr_omop_format/condition_era.parquet" - config_name: omop_condition_occurrence data_files: "tinyehr_omop_format/condition_occurrence.parquet" - config_name: omop_cost data_files: "tinyehr_omop_format/cost.parquet" - config_name: omop_death data_files: "tinyehr_omop_format/death.parquet" - config_name: omop_device_exposure data_files: "tinyehr_omop_format/device_exposure.parquet" - config_name: omop_dose_era data_files: "tinyehr_omop_format/dose_era.parquet" - config_name: omop_drug_era data_files: "tinyehr_omop_format/drug_era.parquet" - config_name: omop_drug_exposure data_files: "tinyehr_omop_format/drug_exposure.parquet" - config_name: omop_fact_relationship data_files: "tinyehr_omop_format/fact_relationship.parquet" - config_name: omop_location data_files: "tinyehr_omop_format/location.parquet" - config_name: omop_measurement data_files: "tinyehr_omop_format/measurement.parquet" - config_name: omop_metadata data_files: "tinyehr_omop_format/metadata.parquet" - config_name: omop_note data_files: "tinyehr_omop_format/note.parquet" - config_name: omop_note_nlp data_files: "tinyehr_omop_format/note_nlp.parquet" - config_name: omop_observation data_files: "tinyehr_omop_format/observation.parquet" - config_name: omop_observation_period data_files: "tinyehr_omop_format/observation_period.parquet" - config_name: omop_payer_plan_period data_files: "tinyehr_omop_format/payer_plan_period.parquet" - config_name: omop_person data_files: "tinyehr_omop_format/person.parquet" - config_name: omop_procedure_occurrence data_files: "tinyehr_omop_format/procedure_occurrence.parquet" - config_name: omop_provider data_files: "tinyehr_omop_format/provider.parquet" - config_name: omop_specimen data_files: "tinyehr_omop_format/specimen.parquet" - config_name: omop_visit_detail data_files: "tinyehr_omop_format/visit_detail.parquet" - config_name: omop_visit_occurrence data_files: "tinyehr_omop_format/visit_occurrence.parquet" --- # TinyEHR **v0.2.0** | [GitHub](https://github.com/vidulpanickan/TinyEHR) | [Website](https://tinyehr.org) | [PyPI](https://pypi.org/project/tinyehr/) A `100` patient dataset of Electronic Health Records, built for learning, experimenting, and prototyping healthcare data tools and AI agentic systems. Typically, working with real healthcare data requires credentialing and data access agreements. TinyEHR is free to use. > **This dataset is for learning, prototyping, and exploration only. It should not be used for clinical analysis, medical decision-making, or patient care.** This dataset is derived from real EHR data from Beth Israel Deaconess Medical Center (BIDMC) in Boston, US. The data has been de-identified, meaning it has been stripped of any information that could identify the patient such as names, medical record numbers, and addresses to protect patient privacy. This dataset contains no protected health information (PHI). | Stat | Value | |------|-------| | Patients | 100 | | Hospital admissions | 275 | | ICU stays | 140 | | Clinical notes | 4,580 | | Gender | 43 F / 57 M | | Date range | 2011 - 2022 | | Tables (MIMIC) | 33 | | Tables (OMOP) | 32 | **Browse the dataset**: explore 30+ tables, column definitions, and relationships across MIMIC-IV and OMOP formats. [![TinyEHR Schema Explorer](https://raw.githubusercontent.com/vidulpanickan/TinyEHR/main/assets/tinyehr-explorer.png)](https://tinyehr.org) **AI assisted SQL**: ask your queries in plain English. [![TinyEHR AI SQL](https://raw.githubusercontent.com/vidulpanickan/TinyEHR/main/assets/tinyehr-ai-sql.png)](https://tinyehr.org) ## Quick Start ```python from datasets import load_dataset patients = load_dataset("vidulpanickan/TinyEHR", "mimic_patients") admissions = load_dataset("vidulpanickan/TinyEHR", "mimic_admissions") notes = load_dataset("vidulpanickan/TinyEHR", "mimic_noteevents") ``` Also available as a Python package: `pip install tinyehr` ([PyPI](https://pypi.org/project/tinyehr/)) ## What does the data look like? **patients** (`subject_id` = patient ID, `anchor_age` = age at anchor year, `dod` = date of death): ```json { "subject_id": 10014729, "gender": "F", "anchor_age": 21, "anchor_year": 2013, "anchor_year_group": "2011 - 2013", "dod": null } ``` **noteevents** (`hadm_id` = hospital admission ID, `note_type` = type of clinical note): ```json { "note_id": "10014729-DS-0001", "subject_id": 10014729, "hadm_id": 23300884, "note_type": "Discharge summary", "chartdate": "2013-03-19", "text": "Admission Date: 2013-03-19 Discharge Date: 2013-03-28\n\nDOB: 1992 Sex: F\n\nService: VSURG → CSURG\n\nAttending: Dr. Katriel Silvane\n\nALLERGIES: NKDA\n\nCC: Postop wound infection s/p thoracotomy..." } ``` There are 30+ tables covering admissions, diagnoses, lab results, medications, procedures, vitals, clinical notes, and more. Explore all tables at [tinyehr.org](https://tinyehr.org). ## Two Formats | Format | Tables | Rows | Best for | |--------|--------|------|----------| | `tinyehr_mimic_format` | 33 | ~1.4M | Learning how hospital data works | | `tinyehr_omop_format` | 32 | ~472K | Building tools that work across health systems | **MIMIC-IV format** follows the original MIMIC-IV schema. If you're new to EHR data, start here. **OMOP CDM v5.3.1 format** reorganizes the same data into a universal schema where diagnoses, labs, and medications are mapped to standardized medical vocabularies. Full details: [ABOUT_THE_DATA.md](https://github.com/vidulpanickan/TinyEHR/blob/main/ABOUT_THE_DATA.md) ## Usage - Build and test AI agents that query, reason over, and navigate real hospital data - Prototype clinical NLP and text-to-SQL systems against realistic clinical notes and multi-table schemas - Learn how EHR data is structured across MIMIC-IV and OMOP formats ## Known Limitations - **100 patients only**: this is a learning and prototyping dataset, not statistically representative of any population - **Clinical notes are generated and not validated**: the notes were generated using Anthropic's Claude Opus 4.6, grounded in each patient's structured data during their hospital visit. They have not been validated by clinicians and may contain hallucinated or inaccurate clinical details (e.g., incorrect ages, fabricated findings, inconsistent timelines). They should not be treated as clinically accurate - **Single institution**: all data comes from one US academic medical center (Beth Israel Deaconess Medical Center in Boston), so demographics and clinical patterns reflect this specific patient population - **OMOP vocabulary subset**: the OMOP format uses a subset of the full OHDSI Athena vocabulary, limited to the concepts needed for these 100 patients ## Roadmap - Synthetic clinical notes authored by clinicians (currently generated by LLM) - Additional data modalities including medical imaging (X-ray, CT scan) ## Citation If you use TinyEHR in your work, please cite: ```bibtex @misc{tinyehr2026, title={TinyEHR: A 100 Patient Electronic Health Records Dataset for Learning and Prototyping Agentic AI}, author={Vidul Ayakulangara Panickan}, year={2026}, url={https://github.com/vidulpanickan/TinyEHR} } ``` ## Source Citations 1. Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2023). MIMIC-IV, a freely accessible electronic health record dataset. *Scientific Data*, 10(1), 1. https://doi.org/10.1038/s41597-022-01899-x 2. Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2023). MIMIC-IV Clinical Database Demo (version 2.2). *PhysioNet*. https://doi.org/10.13026/dp1f-ex47 3. Kallfelz, M., Tsvetkova, A., Pollard, T., Kwong, M., Lipori, G., Huser, V., Osborn, J., Hao, S., & Williams, A. (2021). MIMIC-IV Demo Data in the OMOP Common Data Model (version 0.9). *PhysioNet*. https://doi.org/10.13026/p1f5-7x35 ## License [ODbL-1.0](https://opendatacommons.org/licenses/odbl/1-0/) (Open Data Commons Open Database License). Free to use, share, and modify. Redistributed versions must use the same license.
提供机构:
vidulpanickan
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作