five

A Large-Scale Multimodal Instruction Dataset for Remote Sensing Agents

收藏
DataCite Commons2026-01-22 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=da7b29c9762c44e0860dac311cc55f60
下载链接
链接失效反馈
官方服务:
资源简介:
Solemnly declare: If you use this open source content in papers, books, academic reports and other works, please quote the following documents (the original link has the latest citation format):Citation: WANG Peijin, HU Huiyang, FENG Yingchao, DIAO Wenhui, SUN Xian. A Large-Scale Multimodal Instruction Dataset for Remote Sensing Agents[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250818· Authors: WANG Peijin, HU Huiyang, FENG Yingchao, DIAO Wenhui, SUN Xian· Correspondents: HU Huiyang,huhuiyang22@mails.ucas.ac.cn· Author: the Aerospace Information Research Institute, Chinese Academy of Sciences, the University of Chinese Academy of Sciences, the School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Science, the Key Laboratory of Target Cognition and Application Technology (TCAT), Aerospace Information Research Institute, Chinese Academy of Sciences.· Correspondents: HU Huiyang,huhuiyang22@mails.ucas.ac.cn· Funds: Science and Disruptive Technology Program,AIRCAS (2025-AIRCAS-SDTP-04)Open source content1 A Large-Scale Multimodal Instruction Dataset for Remote Sensing AgentsAbstract: The advancement of multimodal foundation models has introduced new opportunities for intelligent agents that can jointly perform perception, cognition, and decision-making. However, the application of such models in the remote sensing (RS) domain remains limited, primarily due to the absence of large-scale, structured, and multimodality-aligned datasets that support multi-task learning. In this work, we introduce a comprehensive remote sensing multimodal instruction dataset tailored for unified modeling across 9 task categories and 21 sub-datasets, encompassing over 2 million samples. The dataset incorporates three major sensing modalities—optical, synthetic aperture radar (SAR), and infrared imagery—and provides standardized instruction formats, spatial annotations, and task-specific outputs. Through unified data organization and structured instruction templates, we support a wide range of tasks including relation reasoning, instruction decomposition, UAV navigation planning, grounded captioning, and multimodal perception. We also provide benchmark results on remote sensing foundation models, demonstrating the dataset’s effectiveness in improving multimodal understanding and cross-task generalization. This dataset offers a valuable foundation for building intelligent RS agents and promotes future research in instruction-driven multimodal learning.
提供机构:
Science Data Bank
创建时间:
2026-01-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作