GastroVL, a multimodal dataset of 10,000 Esophagogastroduodenoscopy images with structured text descriptions

Name: GastroVL, a multimodal dataset of 10,000 Esophagogastroduodenoscopy images with structured text descriptions
Creator: Zenodo
Published: 2026-05-02 23:44:00
License: 暂无描述

DataCite Commons2026-05-02 更新2026-05-07 收录

下载链接：

https://zenodo.org/doi/10.5281/zenodo.19984391

下载链接

链接失效反馈

官方服务：

资源简介：

GastroVL is a high‑quality multimodal dataset comprising 10,000 esophagogastroduodenoscopy (EGD) images paired with structured text descriptions, covering ten anatomical sites of the upper gastrointestinal tract. Rigorous multi‑stage expert annotation ensured high reliability: senior endoscopists independently performed anatomical classification and text generation, with discrepancies resolved through expert adjudication. Each structured description captures standardized features including location, morphology, size, margin, and colour, making the dataset directly suitable for training multimodal large language models (MLLMs) in tasks such as automated report generation, cross‑modal retrieval, and fine‑grained lesion understanding. In detail, the dataset contains ten folders, each named after a specific anatomical location identified by a two‑digit numeric code: “01 esophagus”, “02 Z-line”, “03 cardia”, “04 fundus of stomach”, “05 body of stomach”, “06 incisura angularis”, “07 gastric antrum”, “08 pylorus”, “09 duodenal bulb”, and “10 descending duodenum”. Within each anatomical location folder, two subfolders are named with a three‑digit code: the first two digits encode the anatomical site, and the third digit indicates the clinical class (0=normal, 1=abnormal). Inside each class folder, two additional subfolders store the multimedia files: (1) image: contains the EGD images in PNG format (e.g., 0100001.png); (2) text: contains the corresponding text descriptions in TXT format (e.g., 0100001.txt). Although the images and their corresponding text descriptions share identical base filenames to ensure straightforward pairing, we additionally provide an index file named index.csv at the root directory to enable more efficient programmatic access.

GastroVL 是一款高质量多模态数据集（multimodal dataset），包含10000对上消化道内镜检查（esophagogastroduodenoscopy, EGD）图像与结构化文本描述，覆盖上消化道的10个解剖部位。严格的多阶段专家标注流程保障了数据集的高可靠性：资深内镜医师独立完成解剖分类与文本生成，存在分歧时通过专家会商解决。每一条结构化描述均记录了标准化特征，包括部位、形态、大小、边界与颜色，使得该数据集可直接用于训练多模态大语言模型（multimodal large language models, MLLMs），适配自动报告生成、跨模态检索以及细粒度病变理解等任务。具体而言，该数据集包含10个文件夹，每个文件夹以两位数字代码命名对应解剖部位：“01 食管（esophagus）”、“02 Z线（Z-line）”、“03 贲门（cardia）”、“04 胃底（fundus of stomach）”、“05 胃体（body of stomach）”、“06 胃角切迹（incisura angularis）”、“07 胃窦（gastric antrum）”、“08 幽门（pylorus）”、“09 十二指肠球部（duodenal bulb）”、“10 十二指肠降段（descending duodenum）”。在每个解剖部位文件夹下，设有两个以三位数字代码命名的子文件夹：前两位数字对应解剖部位编码，第三位数字表示临床类别（0代表正常，1代表异常）。在每个类别子文件夹中，另有两个子文件夹存储多媒体文件：(1) image：存储PNG格式的EGD图像（例如：0100001.png）；(2) text：存储对应TXT格式的文本描述（例如：0100001.txt）。尽管图像与对应文本描述共享相同的基础文件名以实现便捷配对，我们仍在根目录提供了名为index.csv的索引文件，以支持更高效的程序化访问。

提供机构：

Zenodo

创建时间：

2026-05-02