A Large-Scale Multimodal Instruction Dataset for Remote Sensing Agents
收藏DataCite Commons2026-01-22 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=da7b29c9762c44e0860dac311cc55f60
下载链接
链接失效反馈官方服务:
资源简介:
Solemnly declare: If you use this open source content in papers, books, academic reports and other works, please quote the following documents (the original link has the latest citation format):Citation: WANG Peijin, HU Huiyang, FENG Yingchao, DIAO Wenhui, SUN Xian. A Large-Scale Multimodal Instruction Dataset for Remote Sensing Agents[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250818· Authors: WANG Peijin, HU Huiyang, FENG Yingchao, DIAO Wenhui, SUN Xian· Correspondents: HU Huiyang,huhuiyang22@mails.ucas.ac.cn· Author: the Aerospace Information Research Institute, Chinese Academy of Sciences, the University of Chinese Academy of Sciences, the School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Science, the Key Laboratory of Target Cognition and Application Technology (TCAT), Aerospace Information Research Institute, Chinese Academy of Sciences.· Correspondents: HU Huiyang,huhuiyang22@mails.ucas.ac.cn· Funds: Science and Disruptive Technology Program,AIRCAS (2025-AIRCAS-SDTP-04)Open source content1 A Large-Scale Multimodal Instruction Dataset for Remote Sensing AgentsAbstract: The advancement of multimodal foundation models has introduced new opportunities for intelligent agents that can jointly perform perception, cognition, and decision-making. However, the application of such models in the remote sensing (RS) domain remains limited, primarily due to the absence of large-scale, structured, and multimodality-aligned datasets that support multi-task learning. In this work, we introduce a comprehensive remote sensing multimodal instruction dataset tailored for unified modeling across 9 task categories and 21 sub-datasets, encompassing over 2 million samples. The dataset incorporates three major sensing modalities—optical, synthetic aperture radar (SAR), and infrared imagery—and provides standardized instruction formats, spatial annotations, and task-specific outputs. Through unified data organization and structured instruction templates, we support a wide range of tasks including relation reasoning, instruction decomposition, UAV navigation planning, grounded captioning, and multimodal perception. We also provide benchmark results on remote sensing foundation models, demonstrating the dataset’s effectiveness in improving multimodal understanding and cross-task generalization. This dataset offers a valuable foundation for building intelligent RS agents and promotes future research in instruction-driven multimodal learning.
提供机构:
Science Data Bank
创建时间:
2026-01-22



