Comprehensive Multispecialty Medical Records Dataset
收藏Snowflake2024-08-12 更新2024-08-13 收录
下载链接:
https://app.snowflake.com/marketplace/listing/GZT1Z1X6UF7
下载链接
链接失效反馈官方服务:
资源简介:
# General Overview
The table lists multiple specialties such as Allergy & Immunology, Anaesthesiology, Cardiology, Critical Care, Dentistry, Dermatology, Emergency Medicine, Endocrinology, ENT, Family Medicine, Gastroenterology, General Medicine, Geriatric Medicine, Haematology, Haematology-Oncology, Internal Medicine, Nephrology, Neurology, Neurosurgery, Obstetrics & Gynaecology, Occupational Medicine, Oncology, Ophthalmology, Oral Surgery, Orthopaedics, Osteopathic, Pain Medicine, Pathology, Paediatrics, Physiatric, Physical Therapy, Plastic Surgery, Podiatry, Preventive Medicine, Psychiatry, Psychotherapy, Pulmonology, Radiology, Rehabilitation, Rheumatology, Sleep Medicine, Speech Therapy, Sports Medicine, Surgery, Urology, Wound Care, and Unknown.
Each specialty is further categorized by work types such as Clinic Note, Consultation, Letter, Operative Report, Progress Report, Radiology Report, Transfer Summary, etc.
The table provides the sum of total character count and total audio hours for each work type within each specialty.
The Comprehensive Multispecialty Medical Records Dataset represents a vast and diverse compilation of medical records sourced from multiple specialties, making it an invaluable resource for advancing healthcare innovations. This dataset is designed to meet the growing needs of researchers, data scientists, and healthcare professionals seeking high-quality, real-world data for developing and validating machine learning models, enhancing clinical workflows, and driving data-driven decision-making in the healthcare industry.
# **Scope and Scale:**
- **Specialties Included:** The dataset spans 31 distinct medical specialties, ranging from Cardiology and Internal Medicine to more specialized fields such as Sleep Medicine, Psychiatry, and Rheumatology. Each specialty is well-represented with a diverse array of document types, providing a comprehensive view of patient care across different medical disciplines.
- **Document Types:** The dataset includes a variety of work types such as Clinic Notes, Consultation, Letters, Operative Reports, Progress Reports, Radiology Reports, and Transfer Summary. This extensive categorization allows for in-depth analysis and modelling, making it ideal for applications requiring diverse clinical documentation.
- **Volume:** With a total character count exceeding 24.7 billion and over 271,997 hours of audio recordings, this dataset offers unparalleled depth, making it suitable for large-scale machine-learning projects and comprehensive clinical research.
## **Key Highlights**
- **Total Character Count:** 24,706,408,846
- **Total Audio Hours:** 271,997.465
- **Specialties:** Physician dictation and transcripts cover 31 specialties such as Cardiology, Internal Medicine, Family Medicine, Surgery, and others.
**Dictation Audio Devices:** Audio captured from various devices:
- Telephone Dictation (54.3%)
- Digital Recorder (24.9%)
- Speech Mic (5.4%)
- Smartphone (2.7%)
- Unknown (12.7%).
**Geographical Coverage:** The dataset includes dictation audio from physicians across nearly all US states, ensuring broad regional representation and variability in medical practice styles.
**Physician Age Groups:** The dictation audio encompasses a wide age range of physicians, from 30 to over 70 years old, providing a diverse perspective on clinical practice and communication styles.
- 30-50 years (32%)
- 50-70 years (54%)
- 70+ years (13%).
# Compliance and Privacy:
All audio and text records in the dataset have been meticulously de-identified in accordance with Safe Harbor Guidelines and HIPAA compliance, ensuring that patient privacy is fully protected while maintaining the integrity and usability of the data for research and development purposes.
<p><br/></p>
提供机构:
Shaip AI Data
创建时间:
2024-08-02
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集涵盖31个医学专科的临床记录,包含24.7亿字符文本和27.2万小时音频,数据来自全美各州不同年龄段医师的多样化诊疗记录。所有数据均通过HIPAA合规脱敏处理,适用于医疗AI模型开发和临床研究。
以上内容由遇见数据集搜集并总结生成



