Predicting Invasiveness of Lung Adenocarcinoma from Chest CT with Few-shot Vision-Language Ternary Classification Model

Mendeley Data2026-04-18 收录

下载链接：

https://data.mendeley.com/datasets/h7b4ryrbzw

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset contains the research data used in the study “Predicting Invasiveness of Lung Adenocarcinoma from Chest CT with Few-shot Vision–Language Ternary Classification Model.” It includes data from 848 patients with pathologically confirmed lung adenocarcinoma collected across four medical centers. The dataset supports a study evaluating the GPT-4o vision–language model for ternary classification of pure ground-glass nodules (pGGNs). The input data for the GPT-4o model are provided in MP4 format and organized into three folders according to pathological subtype: preinvasive lesions (MP4_PRE; n = 333), minimally invasive adenocarcinomas (MP4_MIA; n = 376), and invasive adenocarcinomas (MP4_IAC; n = 139). To promote transparency and reproducibility, the dataset also includes two supplementary scripts, "dicm_to_nii.py" and "nii_to_mp4.py", which detail the anonymization and data conversion processes used in this study. These scripts demonstrate the step-by-step transformation from the original DICOM-format CT images to anonymized NIfTI (.nii.gz) files and subsequently to MP4-format videos used as model inputs. This workflow provides researchers with a clear reference for ensuring patient privacy protection when applying online vision–language models to medical imaging data. Due to Mendeley Data’s maximum storage capacity of 10 GB, we uploaded all video data used as inputs for the vision–language models (GPT-4o, Google Gemini 2.5 Pro, and Molmo), which together occupy 9.94 GB of space. Accordingly, this dataset contains only the anonymized video data used for model analysis. The CT images of all patients (totaling 81.1GB) will be disclosed in other databases that can provide the corresponding capacity. Users of this dataset please cite the following publication: “Predicting Invasiveness of Lung Adenocarcinoma from Chest CT with Few-shot Vision–Language Ternary Classification Model.”

创建时间：

2025-10-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集