five

medical-imaging

收藏
魔搭社区2025-11-27 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/Kratos-AI/medical-imaging
下载链接
链接失效反馈
官方服务:
资源简介:
# X-ray Reports Dataset *This dataset contains high-quality (“A-grade”) anonymized X-ray images paired with radiology reports. It has been carefully curated, cleaned, and verified to ensure accuracy, completeness, and compliance with privacy standards (e.g., HIPAA/GDPR), making it suitable for high-stakes or research-grade model training.* ## Contact For queries or collaborations related to this dataset, contact: - anoushka@kgen.io - abhishek.vadapalli@kgen.io ## Supported Tasks - **Task Categories**: - Image Classification - Image-to-Text Generation - **Supported Tasks**: - Radiology report generation from X-ray images - Multi-label classification of thoracic pathologies (e.g., pneumonia, cardiomegaly) - Medical image analysis for triage support - Cross-modal learning for vision-language models - Feature extraction for diagnostic AI research ## Languages - **Primary Language**: English (radiology reports) ## Dataset Creation ### Curation Rationale This dataset was created to advance medical AI research by providing paired X-ray images and radiology reports for tasks like automated report generation and disease detection. It aims to support the development of robust, generalizable models for radiology. ### Source Data - **Contributors**: De-identified data from hospital archives and public medical repositories - **Collection Process**: Images sourced from PACS systems (2015–2023), reports authored by board-certified radiologists, anonymized to remove patient identifiers. ### Other Known Limitations - **Size**: Limited to ~10,000 samples, which may restrict generalization - **Demographic Bias**: Overrepresentation of adult urban patients; limited pediatric data - **Image Quality**: Variations in X-ray resolution or equipment may affect consistency - **Label Noise**: Potential errors in report-based labels extracted via NLP ## Intended Uses ### ✅ Direct Use - Training and benchmarking models for radiology report generation - Research in medical image-to-text generation - Development of AI tools for radiology triage and decision support - Academic research in medical imaging and natural language processing ### ❌ Out-of-Scope Use - Clinical diagnosis without human radiologist oversight - Commercial use without proper attribution or ethical review - Applications violating patient privacy or medical ethics - Real-time deployment without additional validation ## License CC BY 4.0

# X射线报告数据集(X-ray Reports Dataset) 本数据集包含高质量的“A级”匿名化X射线图像与放射科报告的配对数据。该数据集经过精心筛选、清洗与验证,以确保数据的准确性、完整性,并符合隐私保护标准(如健康保险流通与责任法案(HIPAA)、通用数据保护条例(GDPR)),适用于高风险场景或研究级别的模型训练。 ## 联系方式 若您有关于该数据集的咨询或合作需求,请联系: - anoushka@kgen.io - abhishek.vadapalli@kgen.io ## 支持任务 - **任务类别**: - 图像分类 - 图像到文本生成 - **支持的具体任务**: - 基于X射线图像的放射科报告生成 - 胸部病理多标签分类(如肺炎、心脏肥大) - 用于分诊支持的医学图像分析 - 面向视觉语言模型的跨模态学习 - 用于诊断型人工智能研究的特征提取 ## 语言 - **主要语言**:英语(放射科报告) ## 数据集构建 ### 筛选依据 本数据集旨在推动医学人工智能研究,通过提供配对的X射线图像与放射科报告,支撑自动报告生成、疾病检测等相关任务,致力于开发稳健且可泛化的放射学人工智能模型。 ### 源数据 - **贡献方**:来自医院档案与公开医学资源的去标识化数据 - **采集流程**:图像采集自2015年至2023年的影像归档和通信系统(PACS),报告由持证放射科医师撰写,并经过匿名化处理以移除患者识别信息。 ### 已知其他局限性 - **样本规模**:仅包含约10000条样本,可能限制模型的泛化能力 - **人口统计学偏差**:成年城市患者占比过高,儿科数据较为有限 - **图像质量差异**:X射线分辨率或设备型号的差异可能影响数据一致性 - **标签噪声**:通过自然语言处理(NLP)提取的报告标注可能存在潜在误差 ## 预期用途 ### ✅ 直接用途 - 用于放射科报告生成相关模型的训练与基准测试 - 医学图像到文本生成领域的研究 - 开发用于放射科分诊与决策支持的人工智能工具 - 医学成像与自然语言处理领域的学术研究 ### ❌ 超出适用范围的用途 - 无放射科医师监督的临床诊断 - 未获得适当授权或未进行伦理审查的商业使用 - 违反患者隐私或医学伦理的应用场景 - 未经额外验证的实时部署 ## 授权协议 知识共享署名4.0国际许可协议(CC BY 4.0)
提供机构:
maas
创建时间:
2025-10-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作