five

Training set of clinical meta-data

收藏
DataCite Commons2025-06-01 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/Training_set_of_clinical_meta-data/3586932/1
下载链接
链接失效反馈
官方服务:
资源简介:
This MD Anderson Cancer Center set of anonymized high-quality computed tomography (CT) scans with contrast represent a comparatively homogenous, uniform cohort of 315 oropharyngeal squamous carcinomas with detailed clinical history, consistent follow-up of &gt; 2 years, known etiological/biological correlates (specifically, human papilloma virus status). Our major target is to assess/validate the radiomics workflow and predictive capacity of radiomics signatures from challenge participants.We imported the CT scans from the patients’ electronic medical records, that were performed before the initiation of the radiation treatment course. All the patients were treated using the IMRT modality. Some patients were simultaneously prescribed chemotherapy. We intended that the CT films would be as much representative of the original simulation CT scans that were used for treatment planning, in which no contrast was injected according to our institutional policy.Specifically, we posted one-half of the CT files from the dataset, in DICOM-RT format, on the Kaggle in Class server system, as a “training set”. DICOM-RT files were fully anonymized, with expert physician segmented primary tumor and lymph node regions of interest, to eliminate segmentation-related uncertainty for challengers. The primary oropharyngeal tumor was segmented in red. Whereas, the metastatic cervical lymph nodes were segmented individually, rather than on the basis of the nodal level classification system. The green color was applied in contouring the nodes. Both training and test sets include the following data for each DICOM-RT case:agegenderracetumor side and subsiteT-categoryN-categoryAJCC stagePathologic gradesmoking status (in pack-years)Challenge participants will also be able to download a “test" dataset, with the remaining random selected half of the dataset, which will have the HPV status blinded.<br>

本数据集源自德克萨斯大学MD安德森癌症中心(MD Anderson Cancer Center),包含经完全匿名化处理的高质量增强计算机断层扫描(CT)影像,其纳入315例口咽鳞状细胞癌患者,队列同质性较高且基线特征统一,附带完整临床病史记录与超过2年的规范随访数据,且已知明确的病因学/生物学关联指标——具体为人类乳头瘤病毒(HPV)感染状态。本数据集的核心目标为评估并验证参赛选手所构建的放射组学工作流程,以及其放射组学特征的预测性能。 本次使用的CT影像均提取自患者电子病历,采集时间均早于放射治疗疗程启动时间。所有患者均采用调强放射治疗(IMRT)方案,部分患者同时接受化疗。我们力求本次提供的CT影像尽可能贴合治疗规划所采用的原始模拟CT扫描场景,根据本机构的政策,原始模拟CT扫描并未注射造影剂。 具体而言,我们将数据集中一半的DICOM-RT格式CT影像上传至Kaggle in Class服务器平台,作为"训练集"。所有DICOM-RT文件均已完成完全匿名化处理,并由专业医师标注了原发性肿瘤与淋巴结感兴趣区,以消除参赛选手面临的标注相关不确定性。其中,口咽部原发性肿瘤以红色标注;转移性颈部淋巴结则采用单独标注的方式,而非基于淋巴结分区分类系统进行标注,淋巴结轮廓采用绿色绘制。 训练集与测试集的每一例DICOM-RT病例均附带以下信息:年龄、性别、种族、肿瘤侧别与亚部位、T分期、N分期、AJCC分期、病理分级、吸烟状态(以包-年为单位)。参赛选手还可下载本次数据集剩余随机选取的一半作为"测试集",该测试集的HPV感染状态将处于隐藏状态。
提供机构:
figshare
创建时间:
2016-08-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作