medical-imaging
收藏魔搭社区2025-11-27 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/Kratos-AI/medical-imaging
下载链接
链接失效反馈官方服务:
资源简介:
# X-ray Reports Dataset
*This dataset contains high-quality (“A-grade”) anonymized X-ray images paired with radiology reports. It has been carefully curated, cleaned, and verified to ensure accuracy, completeness, and compliance with privacy standards (e.g., HIPAA/GDPR), making it suitable for high-stakes or research-grade model training.*
## Contact
For queries or collaborations related to this dataset, contact:
- anoushka@kgen.io
- abhishek.vadapalli@kgen.io
## Supported Tasks
- **Task Categories**:
- Image Classification
- Image-to-Text Generation
- **Supported Tasks**:
- Radiology report generation from X-ray images
- Multi-label classification of thoracic pathologies (e.g., pneumonia, cardiomegaly)
- Medical image analysis for triage support
- Cross-modal learning for vision-language models
- Feature extraction for diagnostic AI research
## Languages
- **Primary Language**: English (radiology reports)
## Dataset Creation
### Curation Rationale
This dataset was created to advance medical AI research by providing paired X-ray images and radiology reports for tasks like automated report generation and disease detection. It aims to support the development of robust, generalizable models for radiology.
### Source Data
- **Contributors**: De-identified data from hospital archives and public medical repositories
- **Collection Process**: Images sourced from PACS systems (2015–2023), reports authored by board-certified radiologists, anonymized to remove patient identifiers.
### Other Known Limitations
- **Size**: Limited to ~10,000 samples, which may restrict generalization
- **Demographic Bias**: Overrepresentation of adult urban patients; limited pediatric data
- **Image Quality**: Variations in X-ray resolution or equipment may affect consistency
- **Label Noise**: Potential errors in report-based labels extracted via NLP
## Intended Uses
### ✅ Direct Use
- Training and benchmarking models for radiology report generation
- Research in medical image-to-text generation
- Development of AI tools for radiology triage and decision support
- Academic research in medical imaging and natural language processing
### ❌ Out-of-Scope Use
- Clinical diagnosis without human radiologist oversight
- Commercial use without proper attribution or ethical review
- Applications violating patient privacy or medical ethics
- Real-time deployment without additional validation
## License
CC BY 4.0
# X射线报告数据集(X-ray Reports Dataset)
本数据集包含高质量的“A级”匿名化X射线图像与放射科报告的配对数据。该数据集经过精心筛选、清洗与验证,以确保数据的准确性、完整性,并符合隐私保护标准(如健康保险流通与责任法案(HIPAA)、通用数据保护条例(GDPR)),适用于高风险场景或研究级别的模型训练。
## 联系方式
若您有关于该数据集的咨询或合作需求,请联系:
- anoushka@kgen.io
- abhishek.vadapalli@kgen.io
## 支持任务
- **任务类别**:
- 图像分类
- 图像到文本生成
- **支持的具体任务**:
- 基于X射线图像的放射科报告生成
- 胸部病理多标签分类(如肺炎、心脏肥大)
- 用于分诊支持的医学图像分析
- 面向视觉语言模型的跨模态学习
- 用于诊断型人工智能研究的特征提取
## 语言
- **主要语言**:英语(放射科报告)
## 数据集构建
### 筛选依据
本数据集旨在推动医学人工智能研究,通过提供配对的X射线图像与放射科报告,支撑自动报告生成、疾病检测等相关任务,致力于开发稳健且可泛化的放射学人工智能模型。
### 源数据
- **贡献方**:来自医院档案与公开医学资源的去标识化数据
- **采集流程**:图像采集自2015年至2023年的影像归档和通信系统(PACS),报告由持证放射科医师撰写,并经过匿名化处理以移除患者识别信息。
### 已知其他局限性
- **样本规模**:仅包含约10000条样本,可能限制模型的泛化能力
- **人口统计学偏差**:成年城市患者占比过高,儿科数据较为有限
- **图像质量差异**:X射线分辨率或设备型号的差异可能影响数据一致性
- **标签噪声**:通过自然语言处理(NLP)提取的报告标注可能存在潜在误差
## 预期用途
### ✅ 直接用途
- 用于放射科报告生成相关模型的训练与基准测试
- 医学图像到文本生成领域的研究
- 开发用于放射科分诊与决策支持的人工智能工具
- 医学成像与自然语言处理领域的学术研究
### ❌ 超出适用范围的用途
- 无放射科医师监督的临床诊断
- 未获得适当授权或未进行伦理审查的商业使用
- 违反患者隐私或医学伦理的应用场景
- 未经额外验证的实时部署
## 授权协议
知识共享署名4.0国际许可协议(CC BY 4.0)
提供机构:
maas
创建时间:
2025-10-04



