five

richkoala/RSCC

收藏
Hugging Face2026-03-22 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/richkoala/RSCC
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-4.0 size_categories: - 100K<n<1M task_categories: - image-to-text pretty_name: RSCC tags: - remote sensing - vision-language models - temporal image understanding configs: - config_name: benchmark data_files: - split: benchmark path: RSCC_qvq.jsonl - config_name: EBD data_files: - split: sample path: - EBD/EBD.tar.gz-part-0 - EBD/EBD.tar.gz-part-1 - EBD/EBD.tar.gz-part-2 - EBD/EBD.tar.gz-part-3 - EBD/EBD.tar.gz-part-4 --- # RSCC > [!IMPORTANT] > We found a great number of people are encountering the issue of accessing to our RSCC subset (see Issue [#6](https://github.com/Bili-Sakura/RSCC/issues/6)). Therefore, we release this subset via GoogleDrive, you can download from this [link](https://drive.google.com/file/d/1ZZ6_pN2Z9V-pDKVFfMs5uL5Xef96Tmiv/view?usp=sharing). > [!WARNING] > The user should strictly obey the [xBD License](https://www.xview2.org/). Also, we (RSCC Team) highlight the distribution of this subset data is for research purpose only. We will take down it if any copyright issue concerned. > [Paper](https://huggingface.co/papers/2509.01907) | [Project Page](https://bili-sakura.github.io/RSCC/) | [Code](https://github.com/Bili-Sakura/RSCC) > [!WARNING] > Due to xBD Licenses, we do not provide direct xBD images and masks. Users can get it via https://www.xview2.org/. > The test set of xBD mentioned in our paper can be directly obtained by selecting the first 26 pre- post- images pairs from 19 distinct xBD events to yield all 988=26 * 2 * 19 images ## Overview We introduce the Remote Sensing Change Caption (RSCC) dataset, a new benchmark designed to advance the development of large vision-language models for remote sensing. Existing image-text datasets typically rely on single-snapshot imagery and lack the temporal detail crucial for Earth observation tasks. By providing 62,351 pairs of pre-event and post-event images accompanied by detailed change captions, RSCC bridges this gap and enables robust disaster-awareness bi-temporal understanding. We demonstrate its utility through comprehensive experiments using interleaved multimodal large language models. Our results highlight RSCC’s ability to facilitate detailed disaster-related analysis, paving the way for more accurate, interpretable, and scalable vision-language applications in remote sensing. ![](./assets/rscc_overview2.png) ![](./assets/word_cloud.png) ![](./assets/word_length_distribution.png) ## Dataset Structure ```text ├── EBD/ │ └── <images>.tar.gz ├── xBD/ │ └── <images>.tar.gz └── xBD_subset/ │ └── <images>.tar.gz └── RSCC_qvq.jsonl ``` For detailed dataset usage guidelines, please refer to our GitHub Repo [RSCC](https://github.com/Bili-Sakura/RSCC). ## Sample Usage To infer with baseline models, first set up your environment by navigating to the project root and activating the `genai` conda environment: ```bash cd RSCC # path of project root conda env create -f environment.yaml # genai: env for most baseline models conda activate genai ``` Then, you can run the inference script with optional arguments for output paths and device specification: ```python python ./inference/xbd_subset_baseline.py # or you can specify the output file path, log file path and device python ./inference/xbd_subset_baseline.py --output_file "./output/xbd_subset_baseline.jsonl" --log_file "./logs/xbd_subset_baseline.log" --device "cuda:0" ``` ## Benchmark Results | Model | N-Gram | N-Gram | Contextual Similarity | Contextual Similarity | Avg_L | |-------|--------|----|----------------------|----|-------| | (#Activate Params) | ROUGE(%)↑ | METEOR(%)↑ | BERT(%)↑ | ST5-SCS(%)↑ | (#Words) | | BLIP-3 (3B) | 4.53 | 10.85 | 98.83 | 44.05 | <span style="color:red;">*456</span> | | &nbsp;&nbsp;+ Textual Prompt | 10.07 (<span style="color:green;">+5.54↑</span>) | 20.69 (<span style="color:green;">+9.84↑</span>) | 98.95 (<span style="color:green;">+0.12↑</span>) | 63.67 (<span style="color:green;">+19.62↑</span>) | <span style="color:red;">*302</span> | | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;+ Visual Prompt | 8.45 (<span style="color:red;">-1.62↓</span>) | 19.18 (<span style="color:red;">-1.51↓</span>) | 99.01 (<span style="color:green;">+0.06↑</span>) | 68.34 (<span style="color:green;">+4.67↑</span>) | <span style="color:red;">*354</span> | | Kimi-VL (3B) | 12.47 | 16.95 | 98.83 | 51.35 | 87 | | &nbsp;&nbsp;+ Textual Prompt | 16.83 (<span style="color:green;">+4.36↑</span>) | 25.47 (<span style="color:green;">+8.52↑</span>) | 99.22 (<span style="color:green;">+0.39↑</span>) | 70.75 (<span style="color:green;">+19.40↑</span>) | 108 | | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;+ Visual Prompt | 16.83 (+0.00) | 25.39 (<span style="color:red;">-0.08↓</span>) | 99.30 (<span style="color:green;">+0.08↑</span>) | 69.97 (<span style="color:red;">-0.78↓</span>) | 109 | | Phi-4-Multimodal (4B) | 4.09 | 1.45 | 98.60 | 34.55 | 7 | | &nbsp;&nbsp;+ Textual Prompt | 17.08 (<span style="color:green;">+13.00↑</span>) | 19.70 (<span style="color:green;">+18.25↑</span>) | 98.93 (<span style="color:green;">+0.33↑</span>) | 67.62 (<span style="color:green;">+33.07↑</span>) | 75 | | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;+ Visual Prompt | 17.05 (<span style="color:red;">-0.03↓</span>) | 19.09 (<span style="color:red;">-0.61↓</span>) | 98.90 (<span style="color:red;">-0.03↓</span>) | 66.69 (<span style="color:red;">-0.93↓</span>) | 70 | | Qwen2-VL (7B) | 11.02 | 9.95 | 99.11 | 45.55 | 42 | | &nbsp;&nbsp;+ Textual Prompt | 19.04 (<span style="color:green;">+8.02↑</span>) | 25.20 (<span style="color:green;">+15.25↑</span>) | 99.01 (<span style="color:red;">-0.10↓</span>) | 72.65 (<span style="color:green;">+27.10↑</span>) | 84 | | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;+ Visual Prompt | 18.43 (<span style="color:red;">-0.61↓</span>) | 25.03 (<span style="color:red;">-0.17↓</span>) | 99.03 (<span style="color:green;">+0.02↑</span>) | 72.89 (<span style="color:green;">+0.24↑</span>) | 88 | | LLaVA-NeXT-Interleave (8B) | 12.51 | 13.29 | 99.11 | 46.99 | 57 | | &nbsp;&nbsp;+ Textual Prompt | 16.09 (<span style="color:green;">+3.58↑</span>) | 20.73 (<span style="color:green;">+7.44↑</span>) | 99.22 (<span style="color:green;">+0.11↑</span>) | 62.60 (<span style="color:green;">+15.61↑</span>) | 75 | | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;+ Visual Prompt | 15.76 (<span style="color:red;">-0.33↓</span>) | 21.17 (<span style="color:green;">+0.44↑</span>) | 99.24 (<span style="color:green;">+0.02↑</span>) | 65.75 (<span style="color:green;">+3.15↑</span>) | 88 | | LLaVA-OneVision (8B) | 8.40 | 10.97 | 98.64 | 46.15 | <span style="color:red;">*221</span> | | &nbsp;&nbsp;+ Textual Prompt | 11.15 (<span style="color:green;">+2.75↑</span>) | 19.09 (<span style="color:green;">+8.12↑</span>) | 98.85 (<span style="color:green;">+0.21↑</span>) | 70.08 (<span style="color:green;">+23.93↑</span>) | <span style="color:red;">*285</span> | | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;+ Visual Prompt | 10.68 (<span style="color:red;">-0.47↓</span>) | 18.27 (<span style="color:red;">-0.82↓</span>) | 98.79 (<span style="color:red;">-0.06↓</span>) | 69.34 (<span style="color:red;">-0.74↓</span>) | <span style="color:red;">*290</span> | | InternVL 3 (8B) | 12.76 | 15.77 | 99.31 | 51.84 | 64 | | &nbsp;&nbsp;+ Textual Prompt | _19.81_ (<span style="color:green;">+7.05↑</span>) | _28.51_ (<span style="color:green;">+12.74↑</span>) | **99.55** (<span style="color:green;">+0.24↑</span>) | 78.57 (<span style="color:green;">+26.73↑</span>) | 81 | | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;+ Visual Prompt | 19.70 (<span style="color:red;">-0.11↓</span>) | 28.46 (<span style="color:red;">-0.05↓</span>) | 99.51 (<span style="color:red;">-0.04↓</span>) | **79.18** (<span style="color:green;">+0.61↑</span>) | 84 | | Pixtral (12B) | 12.34 | 15.94 | 99.34 | 49.36 | 70 | | &nbsp;&nbsp;+ Textual Prompt | **19.87** (<span style="color:green;">+7.53↑</span>) | **29.01** (<span style="color:green;">+13.07↑</span>) | 99.51 (<span style="color:green;">+0.17↑</span>) | _79.07_ (<span style="color:green;">+29.71↑</span>) | 97 | | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;+ Visual Prompt | 19.03 (<span style="color:red;">-0.84↓</span>) | 28.44 (<span style="color:red;">-0.57↓</span>) | _99.52_ (<span style="color:green;">+0.01↑</span>) | 78.71 (<span style="color:red;">-0.36↓</span>) | 102 | | CCExpert (7B) | 7.61 | 4.32 | 99.17 | 40.81 | 12 | | &nbsp;&nbsp;+ Textual Prompt | 8.71 (<span style="color:green;">+1.10↑</span>) | 5.35 (<span style="color:green;">+1.03↑</span>) | 99.23 (<span style="color:green;">+0.06↑</span>) | 47.13 (<span style="color:green;">+6.32↑</span>) | 14 | | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;+ Visual Prompt | 8.84 (<span style="color:green;">+0.13↑</span>) | 5.41 (<span style="color:green;">+0.06↑</span>) | 99.23 (+0.00) | 46.58 (<span style="color:red;">-0.55↓</span>) | 14 | | TEOChat (7B) | 7.86 | 5.77 | 98.99 | 52.64 | 15 | | &nbsp;&nbsp;+ Textual Prompt | 11.81 (<span style="color:green;">+3.95↑</span>) | 10.24 (<span style="color:green;">+4.47↑</span>) | 99.12 (<span style="color:green;">+0.13↑</span>) | 61.73 (<span style="color:green;">+9.09↑</span>) | 22 | | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;+ Visual Prompt | 11.55 (<span style="color:red;">-0.26↓</span>) | 10.04 (<span style="color:red;">-0.20↓</span>) | 99.09 (<span style="color:red;">-0.03↓</span>) | 62.53 (<span style="color:green;">+0.80↑</span>) | 22 | ![](/assets/win_rate_plot.png) ## Qualitative Results ### Baseline Models (RSCC: xBD subset) ![](./assets/qualitative_results1.png) ![](./assets/qualitative_results2.png) ### Large Models (RSCC: EBD samples) ![](./assets/qualitative_results3.png) ![](./assets/qualitative_results4.png) ![](./assets/qualitative_results5.png) ## Licensing Information The dataset is released under the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/deed.en), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ## 🙏 Acknowledgement Our RSCC dataset is built based on [xBD](https://www.xview2.org/) and [EBD](https://figshare.com/articles/figure/An_Extended_Building_Damage_EBD_dataset_constructed_from_disaster-related_bi-temporal_remote_sensing_images_/25285009) datasets. We are thankful to [Kimi-VL](https://hf-mirror.com/moonshotai/Kimi-VL-A3B-Instruct), [BLIP-3](https://hf-mirror.com/Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5), [Phi-4-Multimodal](https://hf-mirror.com/microsoft/Phi-4-multimodal-instruct), [Qwen2-VL](https://hf-mirror.com/Qwen/Qwen2-VL-7B-Instruct), [Qwen2.5-VL](https://hf-mirror.com/Qwen/Qwen2.5-VL-72B-Instruct), [LLaVA-NeXT-Interleave](https://hf-mirror.com/llava-hf/llava-interleave-qwen-7b-hf),[LLaVA-OneVision](https://hf-mirror.com/llava-hf/llava-onevision-qwen2-7b-ov-hf), [InternVL 3](https://hf-mirror.com/OpenGVLab/InternVL3-8B), [Pixtral](https://hf-mirror.com/mistralai/Pixtral-12B-2409), [TEOChat](https://github.com/ermongroup/TEOChat) and [CCExpert](https://github.com/Meize0729/CCExpert) for releasing their models and code as open-source contributions. The metrics implements are derived from [huggingface/evaluate](https://github.com/huggingface/evaluate). The training implements are derived from [QwenLM/Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL). ## 📜 Citation ```bibtex @misc{chen2025rscclargescaleremotesensing, title={RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events}, author={Zhenyuan Chen and Chenxi Wang and Ningyu Zhang and Feng Zhang}, year={2025}, eprint={2509.01907}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2509.01907}, } ```
提供机构:
richkoala
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作