five

zhi01/RadGenome-ChestCT

收藏
Hugging Face2026-03-04 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/zhi01/RadGenome-ChestCT
下载链接
链接失效反馈
官方服务:
资源简介:
--- title: RadGenome Chest CT Dataset license: cc-by-4.0 extra_gated_prompt: > ## Terms and Conditions for Using the RadGenome Chest CT **1. Acceptance of Terms** Accessing and using the RadGenome Chest CT dataset implies your agreement to these terms and conditions copied from CT-RATE. If you disagree with any part, please refrain from using the dataset. **2. Permitted Use** - The dataset is intended solely for academic, research, and educational purposes. - Any commercial exploitation of the dataset without prior permission is strictly forbidden. - You must adhere to all relevant laws, regulations, and research ethics, including data privacy and protection standards. **3. Data Protection and Privacy** - Acknowledge the presence of sensitive information within the dataset and commit to maintaining data confidentiality. - Direct attempts to re-identify individuals from the dataset are prohibited. - Ensure compliance with data protection laws such as GDPR and HIPAA. **4. Attribution** - Cite the dataset and acknowledge the providers in any publications resulting from its use. - Claims of ownership or exclusive rights over the dataset or derivatives are not permitted. **5. Redistribution** - Redistribution of the dataset or any portion thereof is not allowed. - Sharing derived data must respect the privacy and confidentiality terms set forth. **6. Disclaimer** The dataset is provided "as is" without warranty of any kind, either expressed or implied, including but not limited to the accuracy or completeness of the data. **7. Limitation of Liability** Under no circumstances will the dataset providers be liable for any claims or damages resulting from your use of the dataset. **8. Access Revocation** Violation of these terms may result in the termination of your access to the dataset. **9. Amendments** The terms and conditions may be updated at any time; continued use of the dataset signifies acceptance of the new terms. **10. Governing Law** These terms are governed by the laws of the location of the dataset providers, excluding conflict of law rules. **Consent:** extra_gated_fields: Name: text Institution: text Email: text I have read and agree with Terms and Conditions for using the RadGenome Chest CT and CT-RATE dataset: checkbox configs: - config_name: grounded reports data_files: - split: train path: dataset/radgenome_files/train_region_report.csv - split: validation path: dataset/radgenome_files/validation_region_report.csv - config_name: grounded vqa data_files: - split: train path: - dataset/radgenome_files/train_vqa_abnormality.csv - dataset/radgenome_files/train_vqa_location.csv - dataset/radgenome_files/train_vqa_presence.csv - dataset/radgenome_files/train_vqa_size.csv - split: validation path: - dataset/radgenome_files/validation_vqa_abnormality.csv - dataset/radgenome_files/validation_vqa_location.csv - dataset/radgenome_files/validation_vqa_presence.csv - dataset/radgenome_files/validation_vqa_size.csv - config_name: case-level vqa data_files: - split: train path: dataset/radgenome_files/train_case_disorders.csv - split: validation path: dataset/radgenome_files/calidation_case_disorders.csv --- ## [RadGenome Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis](https://arxiv.org/pdf/2404.16754) Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities. We introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on [CT-RATE](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE). Specifically, we leverage the latest powerful universal segmentation and large language models, to extend the original datasets (over 25,692 non-contrast 3D chest CT volume and reports from 20,000 patients) from the following aspects: (i) organ-level segmentation masks covering 197 categories, which provide intermediate reasoning visual clues for interpretation; (ii) 665 K multi-granularity grounded reports, where each sentence of the report is linked to the corresponding anatomical region of CT volume in the form of a segmentation mask; (iii) 1.3 M grounded VQA pairs, where questions and answers are all linked with reference segmentation masks, enabling models to associate visual evidence with textual explanations. All grounded reports and VQA pairs in the validation set have gone through manual verification to ensure dataset quality. We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets. We will release all segmentation masks, grounded reports, and VQA pairs to facilitate further research and development in this field. ## Citing Us If you use RadGenome Chest CT, we would appreciate your references to [CT-CLIP](https://arxiv.org/abs/2403.17834) and [our paper](https://arxiv.org/pdf/2404.16754).

--- 标题:RadGenome胸部CT数据集(RadGenome Chest CT Dataset) 许可证:CC BY 4.0 门控访问提示: ## RadGenome胸部CT使用条款 **1. 条款接受** 访问或使用RadGenome胸部CT数据集即表示您同意本条款(条款源自CT-RATE数据集)。若您不同意其中任何条款,请请勿使用本数据集。 **2. 允许使用范围** - 本数据集仅可用于学术、研究及教育用途。 - 未经事先许可,任何对本数据集的商业利用均严格禁止。 - 您必须遵守所有相关法律法规及研究伦理规范,包括数据隐私与保护标准。 **3. 数据保护与隐私** - 确认本数据集包含敏感信息,并承诺维护数据机密性。 - 禁止直接尝试从本数据集中重新识别个人身份。 - 确保遵守《通用数据保护条例》(GDPR)及《健康保险流通与责任法案》(HIPAA)等数据保护法律法规。 **4. 署名要求** - 在使用本数据集产生的任何出版物中,需引用本数据集并致谢数据集提供者。 - 不得声称对本数据集或其衍生作品拥有所有权或独家权利。 **5. 再分发** - 禁止再分发本数据集或其任何部分。 - 共享衍生数据时,需遵守本条款中规定的隐私与保密要求。 **6. 免责声明** 本数据集按“现状”提供,不提供任何明示或暗示的担保,包括但不限于数据的准确性或完整性。 **7. 责任限制** 在任何情况下,数据集提供者均不对因您使用本数据集而产生的任何索赔或损害承担责任。 **8. 访问撤销** 违反本条款可能导致您的数据集访问权限被终止。 **9. 条款修订** 本使用条款可随时更新;继续使用本数据集即表示您接受更新后的条款。 **10. 管辖法律** 本条款受数据集提供者所在地法律管辖,排除法律冲突规则的适用。 **同意:** 门控字段: - 姓名:文本框 - 机构:文本框 - 邮箱:文本框 - 我已阅读并同意RadGenome胸部CT及CT-RATE数据集的使用条款:复选框 配置项: - 配置名称:锚定报告(grounded reports) 数据文件: - 训练集:dataset/radgenome_files/train_region_report.csv - 验证集:dataset/radgenome_files/validation_region_report.csv - 配置名称:锚定视觉问答(grounded vqa) 数据文件: - 训练集: - dataset/radgenome_files/train_vqa_abnormality.csv - dataset/radgenome_files/train_vqa_location.csv - dataset/radgenome_files/train_vqa_presence.csv - dataset/radgenome_files/train_vqa_size.csv - 验证集: - dataset/radgenome_files/validation_vqa_abnormality.csv - dataset/radgenome_files/validation_vqa_location.csv - dataset/radgenome_files/validation_vqa_presence.csv - dataset/radgenome_files/validation_vqa_size.csv - 配置名称:病例级视觉问答(case-level vqa) 数据文件: - 训练集:dataset/radgenome_files/train_case_disorders.csv - 验证集:dataset/radgenome_files/calidation_case_disorders.csv --- ## 《RadGenome胸部CT:一款用于胸部CT分析的锚定多模态视觉语言数据集》(RadGenome Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis) 近年来,医学人工智能(AI for Medicine,AI4Medicine)领域的研究者对通用基础模型的开发给予了极大关注。开发这类模型的一个关键共识在于其依赖数据集规模化,这凸显了开发开源医学影像数据集的必要性,这类数据集需涵盖多种成像模态下的多样化监督信号。 我们推出了基于CT-RATE的RadGenome胸部CT(RadGenome-Chest CT),这是一款大规模、区域引导的3D胸部CT解读综合数据集。具体而言,我们借助当前先进的通用分割与大语言模型(Large Language Model,LLM),从以下几个方面对原始数据集(包含20000名患者的25692余例非增强3D胸部CT体积数据及对应报告)进行拓展: 1. 覆盖197个类别的器官级分割掩码,为影像解读提供中间推理的视觉线索; 2. 66.5万份多粒度锚定报告(grounded reports),其中报告的每一句话均以分割掩码的形式与CT体积对应的解剖区域绑定; 3. 130万组锚定视觉问答(VQA)对,其中问答内容均与参考分割掩码绑定,使模型能够将视觉证据与文本解释建立关联。 验证集的所有锚定报告与VQA对均经过人工审核,以确保数据集质量。 我们认为,RadGenome胸部CT可通过训练模型基于给定分割区域生成文本,显著推动多模态医学基础模型的发展——这是此前相关数据集无法实现的。我们将发布所有分割掩码、锚定报告及VQA对,以助力该领域的后续研究与开发。 ## 引用说明 若您使用RadGenome胸部CT数据集,请引用[CT-CLIP](https://arxiv.org/abs/2403.17834)及[我们的论文](https://arxiv.org/pdf/2404.16754)。
提供机构:
zhi01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作