Comparing the performance of GPT-4 Omni against specialised neural networks in identifying malignant dermatological lesions from smartphone images and structured clinical data

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://data.mendeley.com/datasets/3kdcc8cf92

下载链接

链接失效反馈

官方服务：

资源简介：

The original structured clinical data for the 1000 test cases are stored as clinical_data_original.csv. The imputed version of the clinical dataset is stored as clinical_data_imputed.csv. The image data for all 1000 test cases are stored in the Images folder in .png formats. The JSON files uploaded to the OpenAI API endpoint are in Prompt JSON, separated by modality (clinical, img, multimodal) under the corresponding subfolders. These prompts contain the system prompt, the text prompt, and the image in Base64 format (if applicable). Because the API endpoint had a tendency to error out with larger JSON uploads, the 1000 cases are separated into batches: 10 batches for clinical data inputs (100 cases per batch), 50 batches for image data inputs (20 cases per batch), 100 batches for multimodal inputs (10 cases per batch). The raw 150000 GPT responses are available in GPT Responses, separated by modality under the corresponding subfolders. Each csv file contains responses for all 1000 cases. Each modality subfolder has 50 csv files corresponding to 50 trials each. The cleaned binary labels are available in Parsed Binary Labels, separated by modality under the corresponding subfolders. Each csv file contains a gpt_pred column flagged as either 0 (benign) or 1 (malignant), and this represents prediction by GPT-4o. Each file also contains a true_pred column which contains the ground truth from the PAD-UFES-20 dataset. Each csv file contains responses for all 1000 cases. Each modality subfolder has 50 csv files corresponding to 50 trials each.

本数据集包含1000个测试用例的原始结构化临床数据，存储为clinical_data_original.csv文件；临床数据集的缺失值补全版本存储为clinical_data_imputed.csv文件。所有1000个测试用例的图像数据均以.png格式存储于Images文件夹中。上传至OpenAI API端点的JSON文件为提示词JSON（Prompt JSON），按模态分为临床、图像、多模态三类，存储于对应子文件夹中。此类提示词包含系统提示词、文本提示词，以及按需提供的Base64格式图像。由于API端点在上传较大JSON文件时易出现报错，1000个测试用例被划分为若干批次：临床数据输入分为10个批次（每批次100个用例），图像数据输入分为50个批次（每批次20个用例），多模态输入分为100个批次（每批次10个用例）。 150000条原始GPT响应存储于GPT Responses文件夹中，按模态分类至对应子文件夹。每个CSV文件包含全部1000个测试用例的响应结果，每个模态子文件夹下包含50个CSV文件，分别对应50次独立试验。清洗后的二分类标签存储于Parsed Binary Labels文件夹中，按模态分类至对应子文件夹。每个CSV文件包含一列gpt_pred，其取值为0（良性，benign）或1（恶性，malignant），代表GPT-4o生成的预测结果；同时包含一列true_pred，其值为来自PAD-UFES-20数据集的基准真值（ground truth）。每个CSV文件覆盖全部1000个测试用例，每个模态子文件夹下包含50个CSV文件，分别对应50次独立试验。

创建时间：

2025-05-27