richkoala/RSCC

Name: richkoala/RSCC
Creator: richkoala
Published: 2026-03-22 04:42:42
License: 暂无描述

Hugging Face2026-03-22 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/richkoala/RSCC

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: cc-by-4.0 size_categories: - 100K<n<1M task_categories: - image-to-text pretty_name: RSCC tags: - remote sensing - vision-language models - temporal image understanding configs: - config_name: benchmark data_files: - split: benchmark path: RSCC_qvq.jsonl - config_name: EBD data_files: - split: sample path: - EBD/EBD.tar.gz-part-0 - EBD/EBD.tar.gz-part-1 - EBD/EBD.tar.gz-part-2 - EBD/EBD.tar.gz-part-3 - EBD/EBD.tar.gz-part-4 --- # RSCC > [!IMPORTANT] > We found a great number of people are encountering the issue of accessing to our RSCC subset (see Issue [#6](https://github.com/Bili-Sakura/RSCC/issues/6)). Therefore, we release this subset via GoogleDrive, you can download from this [link](https://drive.google.com/file/d/1ZZ6_pN2Z9V-pDKVFfMs5uL5Xef96Tmiv/view?usp=sharing). > [!WARNING] > The user should strictly obey the [xBD License](https://www.xview2.org/). Also, we (RSCC Team) highlight the distribution of this subset data is for research purpose only. We will take down it if any copyright issue concerned. > [Paper](https://huggingface.co/papers/2509.01907) | [Project Page](https://bili-sakura.github.io/RSCC/) | [Code](https://github.com/Bili-Sakura/RSCC) > [!WARNING] > Due to xBD Licenses, we do not provide direct xBD images and masks. Users can get it via https://www.xview2.org/. > The test set of xBD mentioned in our paper can be directly obtained by selecting the first 26 pre- post- images pairs from 19 distinct xBD events to yield all 988=26 * 2 * 19 images ## Overview We introduce the Remote Sensing Change Caption (RSCC) dataset, a new benchmark designed to advance the development of large vision-language models for remote sensing. Existing image-text datasets typically rely on single-snapshot imagery and lack the temporal detail crucial for Earth observation tasks. By providing 62,351 pairs of pre-event and post-event images accompanied by detailed change captions, RSCC bridges this gap and enables robust disaster-awareness bi-temporal understanding. We demonstrate its utility through comprehensive experiments using interleaved multimodal large language models. Our results highlight RSCC’s ability to facilitate detailed disaster-related analysis, paving the way for more accurate, interpretable, and scalable vision-language applications in remote sensing. ![](./assets/rscc_overview2.png) ![](./assets/word_cloud.png) ![](./assets/word_length_distribution.png) ## Dataset Structure ```text ├── EBD/ │ └── <images>.tar.gz ├── xBD/ │ └── <images>.tar.gz └── xBD_subset/ │ └── <images>.tar.gz └── RSCC_qvq.jsonl ``` For detailed dataset usage guidelines, please refer to our GitHub Repo [RSCC](https://github.com/Bili-Sakura/RSCC). ## Sample Usage To infer with baseline models, first set up your environment by navigating to the project root and activating the `genai` conda environment: ```bash cd RSCC # path of project root conda env create -f environment.yaml # genai: env for most baseline models conda activate genai ``` Then, you can run the inference script with optional arguments for output paths and device specification: ```python python ./inference/xbd_subset_baseline.py # or you can specify the output file path, log file path and device python ./inference/xbd_subset_baseline.py --output_file "./output/xbd_subset_baseline.jsonl" --log_file "./logs/xbd_subset_baseline.log" --device "cuda:0" ``` ## Benchmark Results | Model | N-Gram | N-Gram | Contextual Similarity | Contextual Similarity | Avg_L | |-------|--------|----|----------------------|----|-------| | (#Activate Params) | ROUGE(%)↑ | METEOR(%)↑ | BERT(%)↑ | ST5-SCS(%)↑ | (#Words) | | BLIP-3 (3B) | 4.53 | 10.85 | 98.83 | 44.05 | *456 | |   + Textual Prompt | 10.07 (+5.54↑) | 20.69 (+9.84↑) | 98.95 (+0.12↑) | 63.67 (+19.62↑) | *302 | |       + Visual Prompt | 8.45 (-1.62↓) | 19.18 (-1.51↓) | 99.01 (+0.06↑) | 68.34 (+4.67↑) | *354 | | Kimi-VL (3B) | 12.47 | 16.95 | 98.83 | 51.35 | 87 | |   + Textual Prompt | 16.83 (+4.36↑) | 25.47 (+8.52↑) | 99.22 (+0.39↑) | 70.75 (+19.40↑) | 108 | |       + Visual Prompt | 16.83 (+0.00) | 25.39 (-0.08↓) | 99.30 (+0.08↑) | 69.97 (-0.78↓) | 109 | | Phi-4-Multimodal (4B) | 4.09 | 1.45 | 98.60 | 34.55 | 7 | |   + Textual Prompt | 17.08 (+13.00↑) | 19.70 (+18.25↑) | 98.93 (+0.33↑) | 67.62 (+33.07↑) | 75 | |       + Visual Prompt | 17.05 (-0.03↓) | 19.09 (-0.61↓) | 98.90 (-0.03↓) | 66.69 (-0.93↓) | 70 | | Qwen2-VL (7B) | 11.02 | 9.95 | 99.11 | 45.55 | 42 | |   + Textual Prompt | 19.04 (+8.02↑) | 25.20 (+15.25↑) | 99.01 (-0.10↓) | 72.65 (+27.10↑) | 84 | |       + Visual Prompt | 18.43 (-0.61↓) | 25.03 (-0.17↓) | 99.03 (+0.02↑) | 72.89 (+0.24↑) | 88 | | LLaVA-NeXT-Interleave (8B) | 12.51 | 13.29 | 99.11 | 46.99 | 57 | |   + Textual Prompt | 16.09 (+3.58↑) | 20.73 (+7.44↑) | 99.22 (+0.11↑) | 62.60 (+15.61↑) | 75 | |       + Visual Prompt | 15.76 (-0.33↓) | 21.17 (+0.44↑) | 99.24 (+0.02↑) | 65.75 (+3.15↑) | 88 | | LLaVA-OneVision (8B) | 8.40 | 10.97 | 98.64 | 46.15 | *221 | |   + Textual Prompt | 11.15 (+2.75↑) | 19.09 (+8.12↑) | 98.85 (+0.21↑) | 70.08 (+23.93↑) | *285 | |       + Visual Prompt | 10.68 (-0.47↓) | 18.27 (-0.82↓) | 98.79 (-0.06↓) | 69.34 (-0.74↓) | *290 | | InternVL 3 (8B) | 12.76 | 15.77 | 99.31 | 51.84 | 64 | |   + Textual Prompt | _19.81_ (+7.05↑) | _28.51_ (+12.74↑) | **99.55** (+0.24↑) | 78.57 (+26.73↑) | 81 | |       + Visual Prompt | 19.70 (-0.11↓) | 28.46 (-0.05↓) | 99.51 (-0.04↓) | **79.18** (+0.61↑) | 84 | | Pixtral (12B) | 12.34 | 15.94 | 99.34 | 49.36 | 70 | |   + Textual Prompt | **19.87** (+7.53↑) | **29.01** (+13.07↑) | 99.51 (+0.17↑) | _79.07_ (+29.71↑) | 97 | |       + Visual Prompt | 19.03 (-0.84↓) | 28.44 (-0.57↓) | _99.52_ (+0.01↑) | 78.71 (-0.36↓) | 102 | | CCExpert (7B) | 7.61 | 4.32 | 99.17 | 40.81 | 12 | |   + Textual Prompt | 8.71 (+1.10↑) | 5.35 (+1.03↑) | 99.23 (+0.06↑) | 47.13 (+6.32↑) | 14 | |       + Visual Prompt | 8.84 (+0.13↑) | 5.41 (+0.06↑) | 99.23 (+0.00) | 46.58 (-0.55↓) | 14 | | TEOChat (7B) | 7.86 | 5.77 | 98.99 | 52.64 | 15 | |   + Textual Prompt | 11.81 (+3.95↑) | 10.24 (+4.47↑) | 99.12 (+0.13↑) | 61.73 (+9.09↑) | 22 | |       + Visual Prompt | 11.55 (-0.26↓) | 10.04 (-0.20↓) | 99.09 (-0.03↓) | 62.53 (+0.80↑) | 22 | ![](/assets/win_rate_plot.png) ## Qualitative Results ### Baseline Models (RSCC: xBD subset) ![](./assets/qualitative_results1.png) ![](./assets/qualitative_results2.png) ### Large Models (RSCC: EBD samples) ![](./assets/qualitative_results3.png) ![](./assets/qualitative_results4.png) ![](./assets/qualitative_results5.png) ## Licensing Information The dataset is released under the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/deed.en), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ## 🙏 Acknowledgement Our RSCC dataset is built based on [xBD](https://www.xview2.org/) and [EBD](https://figshare.com/articles/figure/An_Extended_Building_Damage_EBD_dataset_constructed_from_disaster-related_bi-temporal_remote_sensing_images_/25285009) datasets. We are thankful to [Kimi-VL](https://hf-mirror.com/moonshotai/Kimi-VL-A3B-Instruct), [BLIP-3](https://hf-mirror.com/Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5), [Phi-4-Multimodal](https://hf-mirror.com/microsoft/Phi-4-multimodal-instruct), [Qwen2-VL](https://hf-mirror.com/Qwen/Qwen2-VL-7B-Instruct), [Qwen2.5-VL](https://hf-mirror.com/Qwen/Qwen2.5-VL-72B-Instruct), [LLaVA-NeXT-Interleave](https://hf-mirror.com/llava-hf/llava-interleave-qwen-7b-hf),[LLaVA-OneVision](https://hf-mirror.com/llava-hf/llava-onevision-qwen2-7b-ov-hf), [InternVL 3](https://hf-mirror.com/OpenGVLab/InternVL3-8B), [Pixtral](https://hf-mirror.com/mistralai/Pixtral-12B-2409), [TEOChat](https://github.com/ermongroup/TEOChat) and [CCExpert](https://github.com/Meize0729/CCExpert) for releasing their models and code as open-source contributions. The metrics implements are derived from [huggingface/evaluate](https://github.com/huggingface/evaluate). The training implements are derived from [QwenLM/Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL). ## 📜 Citation ```bibtex @misc{chen2025rscclargescaleremotesensing, title={RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events}, author={Zhenyuan Chen and Chenxi Wang and Ningyu Zhang and Feng Zhang}, year={2025}, eprint={2509.01907}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2509.01907}, } ```

提供机构：

richkoala

5,000+

优质数据集

54 个

任务类型

进入经典数据集