Cross-Domain Data Alignment and Multi-Task Driven Language-Image Pre-Training for Radiology Tasks
收藏DataCite Commons2024-07-15 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/cross-domain-data-alignment-and-multi-task-driven-language-image-pre-training-radiology
下载链接
链接失效反馈官方服务:
资源简介:
Advancements in Medical Vision-Language Pre-training (Medical-VLP) progress rapidly by learning representations from paired radiology reports. Nevertheless, there still remain two issues that restrict the development of Medical-VLP: the scarcity of parallel image-report pair and monotony of pre-training tasks. Thus, we propose Multi-Grained Cross-Domain Report Searching (CDRS) strategy, and Multi-Task Driven Language-Image Pre-Training (MLIP) framework. CDRS strives to match report-less radiology images with matched reports, by leveraging collaborative alignment training that bridges cross-domain data, operating at in-domain cross-modal, out-of-domain self-supervised, and patch-wise levels. MLIP utilizes a unified model to compute multiple training objectives with minimal overhead by sharing the same computational graph. It integrates single-encoder, dual-encoder, and encoder-decoder paradigms, enabling the decoupled model to perform both discriminative and generative tasks, such as report generation and medical visual question answering. Extensive experiments and analyses demonstrate that our method achieves state-of-the-art performance across multiple discriminative and generative medical downstream tasks.
提供机构:
IEEE DataPort
创建时间:
2024-07-15



