five

TotalSegmentator segmentations and radiomics features for NCI Imaging Data Commons CT images

收藏
Mendeley Data2024-06-26 更新2024-06-28 收录
下载链接:
https://zenodo.org/records/12004521
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contributes volumetric segmentations of the anatomic regions in a subset of CT images available from NCI Imaging Data Commons [1] (https://imaging.datacommons.cancer.gov/) automatically generated using the TotalSegmentation model v1.5.6 [2]. The initial release includes segmentations for the majority of the CT scans included in the National Lung Screening Trial (NLST) collection [3], [4] already available in IDC. Direct link to open this analysis result dataset in IDC (available after release of IDC v18): https://portal.imaging.datacommons.cancer.gov/explore/filters/?analysis_results_id=TotalSegmentator-CT-Segmentations. Specifically, for each of the CT series analyzed, we include segmentations as generated by TotalSegmentator, converted into DICOM Segmentation object format using dcmqi v1.3.0 [5], and first order and shape features for each of the segmented regions, as produced by pyradiomics v3.0.1 [6]. Radiomics features were converted to DICOM Structured Reporting documents following template TID1500 using dcmqi. TotalSegmentator analysis on the NLST cohort was executed using Terra platform [7]. Implementation of the workflow that was used for performing the analysis is available at https://github.com/ImagingDataCommons/CloudSegmentator [8]. Due to the large size of the files, they are stored in the cloud buckets maintained by IDC, and the attached files are the manifests that can be used to download the actual files. The GCP and AWS manifests provided with this dataset record can be used to download the corresponding files from the IDC Google Cloud Storage (GCS) or Amazon S3 (AWS) buckets free of charge following the instructions available in IDC documentation here: https://learn.canceridc.dev/data/downloading-data. Specifically, you will need to install the s5cmd command line tool on your computer (see instructions at https://github.com/peak/s5cmd#installation), and follow the manifest-specific download instructions accompanying the file list below. If you use the files referenced in the attached manifests, we ask you to cite this dataset and the preprint describing how it was generated [9]. Specific files included in the record are: totalsegmentator_ct_segmentations_aws.s5cmd.zip: compressed AWS-based manifest (to download the files described in the manifest, execute this command: s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com run totalsegmentator_ct_segmentations_aws.s5cmd) totalsegmentator_ct_segmentations_gcs.s5cmd.zip: GCS-based manifest (to download the files described in the manifest, execute this command: s5cmd --no-sign-request --endpoint-url https://storage.googleapis.com run totalsegmentator_ct_segmentations_gcs.s5cmd) Gen3-based manifest (see details in https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids). firstorder and shape radiomics features extracted using pyradiomics, and organized one file per segmented structure (see README file in the zip file for details on how those are organized) pyradiomics_features_csv.zip: saved in CSV format pyradiomics_features_parquet.zip: saved in Parquet format

本数据集针对美国国家癌症研究所影像数据公共平台(NCI Imaging Data Commons, 以下简称IDC)[1](https://imaging.datacommons.cancer.gov/)公开的部分CT影像,提供由TotalSegmentation模型v1.5.6 [2] 自动生成的解剖区域体积分割结果。 本次初始发布涵盖了IDC中已公开的国家肺癌筛查试验(National Lung Screening Trial, 简称NLST)数据集[3,4] 中绝大多数CT扫描序列的分割结果。可在IDC平台(IDC v18发布后开放访问)直接打开该分析结果数据集,访问链接为:https://portal.imaging.datacommons.cancer.gov/explore/filters/?analysis_results_id=TotalSegmentator-CT-Segmentations。 具体而言,针对每一条被分析的CT序列,本数据集包含由TotalSegmentator生成的分割结果、通过dcmqi v1.3.0 [5] 转换为DICOM分割对象(DICOM Segmentation object)格式的文件,以及由pyradiomics v3.0.1 [6] 提取的各分割区域的一阶特征与形状特征。放射组学特征已通过dcmqi遵循TID1500模板转换为DICOM结构化报告文档。 针对NLST队列的TotalSegmentator分析通过Terra平台[7] 执行,本次分析所用工作流的实现代码可访问:https://github.com/ImagingDataCommons/CloudSegmentator [8]。 由于文件体积较大,所有数据均存储于IDC维护的云存储桶中,本数据集附带的清单文件可用于下载实际数据。本数据集记录附带的GCP与AWS清单文件,可按照IDC官方文档(https://learn.canceridc.dev/data/downloading-data)中的说明,免费从IDC谷歌云存储(Google Cloud Storage, 简称GCS)或亚马逊S3(AWS)存储桶下载对应文件。具体操作需先在本地安装s5cmd命令行工具(安装指南见:https://github.com/peak/s5cmd#installation),并遵循对应清单文件附带的下载指令。 如果您使用本数据集附带清单中引用的文件,请引用本数据集以及描述其生成流程的预印本[9]。 本数据集记录包含的具体文件如下: 1. totalsegmentator_ct_segmentations_aws.s5cmd.zip:基于AWS的压缩清单文件。如需下载清单中描述的文件,请执行命令:s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com run totalsegmentator_ct_segmentations_aws.s5cmd 2. totalsegmentator_ct_segmentations_gcs.s5cmd.zip:基于GCS的压缩清单文件。如需下载清单中描述的文件,请执行命令:s5cmd --no-sign-request --endpoint-url https://storage.googleapis.com run totalsegmentator_ct_segmentations_gcs.s5cmd 3. 基于Gen3的清单文件(详情见https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids) 4. 由pyradiomics提取的一阶特征与形状放射组学特征,按每个分割结构对应一个文件的方式组织(具体文件组织方式见压缩包内的README文档): - pyradiomics_features_csv.zip:以CSV格式存储的放射组学特征文件 - pyradiomics_features_parquet.zip:以Parquet格式存储的放射组学特征文件
创建时间:
2024-06-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作