damlacetinkayaa/kate-cd

Name: damlacetinkayaa/kate-cd
Creator: damlacetinkayaa
Published: 2026-03-24 11:08:41
License: 暂无描述

Hugging Face2026-03-24 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/damlacetinkayaa/kate-cd

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: pre_image dtype: image - name: post_image dtype: image - name: label dtype: image splits: - name: train num_bytes: 371521379.0 num_examples: 404 - name: validation num_bytes: 40554801.0 num_examples: 44 - name: test num_bytes: 34643621.0 num_examples: 38 download_size: 446440710 dataset_size: 446719801.0 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* --- # Welcome to the KATE-CD Dataset Welcome to the home page of Kahramanmaraş Türkiye Earthquake-Change Detection Dataset (KATE-CD). If you are reading this README, you are probably visiting one of the following places to learn more about **KATE-CD Dataset** and the associated study "*Earthquake Damage Assessment with SAMCD: A Change Detection Approach for VHR Images*", to be appear in Journal of Applied Remote Sensing. * [Code Ocean Capsule](https://doi.org/10.24433/CO.3747729.v1) in [Open Science Library](https://codeocean.com/explore/82765786-a936-438c-a75a-84e2817294c5) * [GitHub Repository](https://github.com/cscrs/kate-cd) * [HuggingFace](https://huggingface.co/datasets/cscrs/kate-cd) The **KATE-CD Dataset** and the associated codes and supplementary information are published in three places i.e. CodeOcean, GitHub and HuggingFAce for providing redundancy and extended reach. All of the content uploaded to the three websites are the same except small differences because of platform requirements. The CodeOcean platform is mainly used for reproducibility, whereas GitHub is used to provide git access and hence easy collaboration between **KATE-CD Dataset** developers. Finally HuggingFace provides an easy access to the database where you can run existing models in HuggingFace on KATE-CD without too much effort. ## Content of the KATE-CD Dataset The KATE-CD dataset is designed to facilitate the development and evaluation of change detection algorithms for earthquake damage assessment. It provides high-resolution, bitemporal satellite imagery from pre- and post-earthquake events, specifically covering the regions affected by the Kahramanmaraş earthquake in Türkiye. Given the scarcity of earthquake damage assessment datasets, KATE-CD aims to bridge this gap by offering high-quality annotated data, enabling researchers to train and test machine learning models for automated damage detection. ### Source of Satellite Imagery The dataset includes satellite images from Maxar Open Data and Airbus Pleiades, covering seven heavily affected cities: Adıyaman, Gaziantep, Hatay, Kahramanmaraş, Kilis, Osmaniye, and Malatya. These images have a resolution ranging from 0.3m to 0.5m. The collection process involved selecting imagery captured under various lighting conditions, using different sensors and viewing angles. The coordinate reference system EPSG:32637 was chosen for consistency, and radiometrically corrected images with 8-bit spectral resolution were used to maintain uniform color representation across sources. ### Labelling Process A grid-based labeling approach was used to divide the images into 512×512 pixel patches. The Label Studio tool was employed for manual annotation, where 834 post-earthquake images were reviewed, and damaged buildings were marked with polygonal annotations. Each labeled image was then paired with its corresponding pre-earthquake image, resulting in 486 pre/post image pairs for change detection. A binary labeling strategy was applied, where pixels inside damage polygons were assigned a value of 1 (damaged), and all others were set to 0 (undamaged). ### Machine-Learning Ready Format To integrate with the change detection frameworks, the dataset was structured into a standardized format. This included organizing image pairs into directories suitable for model training. The dataset was made publicly available on CodeOcean, GitHub, and HuggingFace, allowing reproducibility and accessibility for researchers. ## Reproducibility: CodeOcean The dataset is published on three platforms: CodeOcean, GitHub and HuggingFace. The purpose of CodeOcean is to provide data, codes and the computing instructions to reproduce the results. CodeOcean uses the term *capsule* to define the collection of everything needed to reproduce the results. Depending on your goal and your time constraints, CodeOcean provide two alternatives to run the capsule and obtain the results: via Open Science Library or Capsule Export. ### Open Science Library If you visit [this capsule](https://doi.org/10.24433/CO.3747729.v1) via [Open Science Library](https://codeocean.com/explore/82765786-a936-438c-a75a-84e2817294c5) developed by [Code Ocean](https://codeocean.com), then you should be able to see the published results in the results folder of the capsule. Code Ocean has an internal publishing process to verify that on each run the capsule will produce the same results. So, if you are in a hurry, or don't bother running the capsule again, then you can take a look at the published results and check the codes and data in the capsule. If you want to run the capsule and produce results by yourself, then all you have to do is to click "Reproducible Run" button in the capsule page. The [Open Science Library](https://codeocean.com/explore/82765786-a936-438c-a75a-84e2817294c5) will run the capsule from the scratch and produce the results. ### Capsule Export If you would like to use your own computing resources for reproduction, then you can export the capsule via "Capsule"--> "Export" menu to your working environment. Please make sure to check "Include Data" option while exporting. After extracting the export file, you should follow the instructions in "REPRODUCING.md". For the sake of completeness, we mention the procesures here. #### Prerequisites - [Docker Community Edition (CE)](https://www.docker.com/community-edition) #### Building the environment This capsule has been published and its environment has been archived and made available on Code Ocean's Docker registry: `registry.codeocean.com/published/82765786-a936-438c-a75a-84e2817294c5:v1` ### Running the capsule to reproduce the results In your terminal, navigate to the folder where you've extracted the capsule and execute the following command, adjusting parameters as needed: ```shell docker run --platform linux/amd64 --rm --gpus all \ --workdir /code \ --volume "$PWD/data":/data \ --volume "$PWD/code":/code \ --volume "$PWD/results":/results \ registry.codeocean.com/published/82765786-a936-438c-a75a-84e2817294c5:v1 bash run ``` ## Published results In the results folder of the CodeOcean capsule, you can reach the pre-computed outputs of the code or you can generate them from scratch with single-click in CodeOcean. In either case, these outputs correspond to the published content in the manuscript. The mapping between capsule results and the content in the manuscript is as follows: Code Outputs Manuscript ------------ ---------------- ---------- predictions.py val_scores.txt Table II evaluate.py train_scores.txt test_scores.txt --------------------------------------------- visualization.py val_plots.pdf Figure 2 train_plots.pdf test_plots.pdf --------------------------------------------- ## For Developers ### Differences between the platforms: * CodeOcean is the primary source of the dataset (*data* folder) and the codes (*code* folder). * GitHub does not contain *data* folder because GitHub is not designed to store and manage large files. * [HuggingFace dataset](https://huggingface.co/datasets/cscrs/kate-cd) is hosted on an isolated repository with Large File Support (LFS). In this isolated repo, Parquet files of the original *data* folder are served. It also includes a README file with an auto-generated metadata for visual presentation on HuggingFace. ### GitHub Access If you would like to look at the capsule more closely, and build a working development environment then you can use [Development Containers](https://containers.dev/) functionality of VSCode. For this purpose, we created **.devcontainer/devcontainer.json** file under the root folder in the capsule. The purpose of this config file is to tell VSCode the location of the **Dockerfile**. Here, for design purposes, we used the same **Dockerfile** provided by CodeOcean. In this way, we do not interfere the building process in the CodeOcean. To open the GitHub repository in DevContainers, you can click the button below. It will open the VSCode in DevContainers mode and fetch the GitHub repository automatically. [![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/cscrs/kate-cd.git) ### CodeOcean Access To open the capsule in VSCode via [Development Containers](https://containers.dev/), you first need to download the capsule. There are two ways: either you can use the capsule export, or you can pull the capsule from git repository. We recommend using git. You can use either CodeOcean or GitHub repository (both have the same content). ~~~bash # CodeOcean git repository # There are two git repos in CodeOcean. # (1) Repo of the published capsule (https://git.codeocean.com/capsule-9061546.git) # (2) Repo of the original capsule (https://git.codeocean.com/capsule-3747729.git) # Here, we are using the git repo of the original capsule $ git clone https://git.codeocean.com/capsule-3747729.git ~~~ or ~~~bash # GitHub git repository $ git clone https://github.com/cscrs/kate-cd.git ~~~ The next step is to open VSCode, select *Open a Remote Window* and then *Open Folder in Container..." option. Select your cloned git folder and the VSCode should start building Docker container and open the content of the capsule. ### HuggingFace Access [HuggingFace](https://huggingface.co/datasets/cscrs/kate-cd) is used to host the database and provide a nice visual access to the developers. HuggingFace uses the Parquet format to host the database. HuggingFace also uses Large File Support (LFS) for the internal git repository. Therefore, we decided to isolate the git repository of HuggingFace from GitHub and CodeOcean. The git repository of HuggingFace host only the database (in Parquet format) and a README. The Parquet files in the HuggingFace repository are updated via: $ cd code $ python utils/hf_update_db.py ### Relative vs Absolute paths We use relative paths to locate the data files in the code to achieve compatibility between different working environments. In this way, the same codes and data structure can be used without any change if one tries to run the capsule on [Open Science Library](https://codeocean.com/explore/82765786-a936-438c-a75a-84e2817294c5) or local development environment. The only requirement of relative-path approach is to run Python codes within the **code** folder similar to this: ~~~bash $ cd code $ python predictions.py ~~~ This approach also fits to the way how CodeOcean runs the capsule. ### Reproducibility results folder If you visit [Open Science Library](https://codeocean.com/explore/82765786-a936-438c-a75a-84e2817294c5), you will see that published results are always populated under **results** folder. This is a special folder CodeOcean uses to store the outputs likes PDFs, PNGs, or ordinary text outputs. Therefore, in CodeOcean capsules **results** folder is not included in *git* structure. So, when you pull or export a CodeOcean capsule, you won't see this folder. Whenever you create an output, you should create **results** folder and put the outputs under it. For the same reason, you should not include it to git.

数据集信息：特征项： - 名称：震前图像（pre_image），数据类型：图像 - 名称：震后图像（post_image），数据类型：图像 - 名称：标注图像（label），数据类型：图像划分集： - 名称：训练集（train），字节数：371521379.0，样本数：404 - 名称：验证集（validation），字节数：40554801.0，样本数：44 - 名称：测试集（test），字节数：34643621.0，样本数：38 下载大小：446440710 数据集总大小：446719801.0 配置项： - 配置名称：default（默认配置）数据文件： - 划分集：train，路径：data/train-* - 划分集：validation，路径：data/validation-* - 划分集：test，路径：data/test-* # 欢迎来到KATE-CD数据集主页欢迎来到土耳其卡赫拉曼马拉什地震变化检测数据集（Kahramanmaraş Türkiye Earthquake-Change Detection Dataset, KATE-CD）的主页。若您正在阅读本说明文档，大概率是通过以下渠道了解**KATE-CD数据集**以及关联研究论文《基于SAMCD的地震损伤评估：一种面向超高分辨率（Very High Resolution, VHR）图像的变化检测方法》，该论文已被《应用遥感学报》（Journal of Applied Remote Sensing）收录。 * [代码海洋胶囊（Code Ocean Capsule）](https://doi.org/10.24433/CO.3747729.v1)，收录于[开放科学库（Open Science Library）](https://codeocean.com/explore/82765786-a936-438c-a75a-84e2817294c5) * [GitHub仓库（GitHub Repository）](https://github.com/cscrs/kate-cd) * [拥抱脸数据集平台（HuggingFace）](https://huggingface.co/datasets/cscrs/kate-cd) **KATE-CD数据集**及其配套代码与补充资料已发布于CodeOcean、GitHub与HuggingFace三个平台，以保障数据冗余性与传播范围。由于平台要求的细微差异，三个平台上传的内容仅存在少量区别。其中CodeOcean平台主要用于结果可复现性验证；GitHub提供Git版本控制访问权限，便于**KATE-CD数据集**开发团队开展协作；HuggingFace则提供便捷的数据库访问途径，研究人员可无需过多配置即可在该平台上基于KATE-CD运行现有模型。 ## KATE-CD数据集内容 KATE-CD数据集旨在推动地震损伤评估用变化检测（Change Detection）算法的开发与评估工作。该数据集提供了地震发生前后的高分辨率卫星影像对，覆盖土耳其卡赫拉曼马拉什地震的受影响区域。鉴于当前地震损伤评估数据集较为稀缺，KATE-CD旨在通过提供高质量的标注数据填补这一空白，助力研究人员训练并测试用于自动化损伤检测的机器学习模型。 ### 卫星影像来源本数据集包含来自Maxar开放数据平台（Maxar Open Data）与空客 Pleiades卫星的影像，覆盖7个受地震影响严重的城市：阿德亚曼（Adıyaman）、加济安泰普（Gaziantep）、哈塔伊（Hatay）、卡赫拉曼马拉什（Kahramanmaraş）、基利斯（Kilis）、奥斯曼尼耶（Osmaniye）与马拉蒂亚（Malatya）。影像分辨率介于0.3米至0.5米之间。影像采集过程兼顾了多种光照条件、不同传感器与观测角度。数据集统一采用坐标参考系EPSG:32637，并使用经过辐射校正的8位光谱分辨率影像，以保障不同来源影像的色彩表现一致性。 ### 标注流程本数据集采用基于网格的标注方案，将影像分割为512×512像素的图像块。标注工作通过Label Studio工具完成：研究人员审阅了834张震后影像，并使用多边形标注框标记受损建筑。随后将每张标注完成的震后影像与对应的震前影像进行配对，最终得到486组震前/震后图像对用于变化检测任务。数据集采用二元标注策略：损伤多边形内部的像素赋值为1（受损），其余像素赋值为0（未受损）。 ### 机器学习就绪格式为适配变化检测框架，本数据集采用标准化结构进行组织：将图像对整理为适合模型训练的目录格式。数据集已通过CodeOcean、GitHub与HuggingFace三个平台公开发布，以保障研究人员的可复现性与数据可访问性。 ## 可复现性：CodeOcean 本数据集已发布于CodeOcean、GitHub与HuggingFace三个平台。CodeOcean的核心作用是提供数据、代码与计算指令，以实现研究结果的可复现。CodeOcean使用“胶囊（capsule）”一词指代复现结果所需的全部内容。根据您的目标与时间限制，CodeOcean提供两种运行胶囊以获取结果的途径：通过开放科学库，或导出胶囊至本地环境。 ### 开放科学库若您通过[开放科学库（Open Science Library）](https://codeocean.com/explore/82765786-a936-438c-a75a-84e2817294c5)访问[该胶囊](https://doi.org/10.24433/CO.3747729.v1)，即可在胶囊的结果文件夹中查看已预计算的输出结果。CodeOcean内置了发布审核流程，以验证每次运行胶囊均可生成一致的结果。因此若您时间紧张，或无需重复运行胶囊，可直接查看已发布的结果与胶囊内的代码与数据。若您希望自行运行胶囊并生成结果，仅需点击胶囊页面中的“可复现运行（Reproducible Run）”按钮即可。开放科学库将从头开始运行胶囊并生成结果。 ### 胶囊导出若您希望使用自有计算资源完成复现工作，可通过“胶囊”→“导出”菜单将胶囊导出至您的工作环境。导出时请务必勾选“包含数据（Include Data）”选项。解压导出文件后，请遵循“REPRODUCING.md”中的说明完成操作。为保证完整性，此处简要说明操作流程。 #### 前置依赖 - [Docker社区版（Docker Community Edition, CE）](https://www.docker.com/community-edition) #### 构建运行环境本胶囊已公开发布，其运行环境已归档并上传至CodeOcean的Docker镜像仓库： `registry.codeocean.com/published/82765786-a936-438c-a75a-84e2817294c5:v1` ### 运行胶囊以复现研究结果在终端中导航至解压后的胶囊文件夹，并执行以下命令（可根据需要调整参数）： shell docker run --platform linux/amd64 --rm --gpus all --workdir /code --volume "$PWD/data":/data --volume "$PWD/code":/code --volume "$PWD/results":/results registry.codeocean.com/published/82765786-a936-438c-a75a-84e2817294c5:v1 bash run ## 已发布的研究结果在CodeOcean胶囊的结果文件夹中，您既可查看代码预计算的输出结果，也可通过CodeOcean一键从头生成结果。无论采用哪种方式，这些输出结果均与论文中发布的内容对应。胶囊输出结果与论文内容的映射关系如下：代码脚本输出文件对应论文内容 ------------ ---------------- ---------- predictions.py val_scores.txt 表II evaluate.py train_scores.txt test_scores.txt --------------------------------------------- visualization.py val_plots.pdf 图2 train_plots.pdf test_plots.pdf --------------------------------------------- ## 面向开发者 ### 各平台间的差异 * CodeOcean是本数据集（*data*文件夹）与代码（*code*文件夹）的原始来源。 * GitHub未包含*data*文件夹，因为GitHub并非为存储与管理大文件而设计。 * [HuggingFace数据集仓库](https://huggingface.co/datasets/cscrs/kate-cd)托管于支持大文件存储（Large File Support, LFS）的独立仓库中，该仓库以Parquet格式存储原始*data*文件夹的内容，并包含一个自动生成的元数据README文件，用于在HuggingFace平台上实现可视化展示。 ### GitHub访问方式若您希望深入了解胶囊内容并构建可用的开发环境，可使用VSCode的[开发容器（Development Containers）](https://containers.dev/)功能。为此，我们已在胶囊根目录下创建了**.devcontainer/devcontainer.json**配置文件，用于告知VSCode**Dockerfile**的位置。出于设计考量，我们复用了CodeOcean提供的**Dockerfile**，因此不会干扰CodeOcean平台上的构建流程。若您希望在开发容器中打开GitHub仓库，可点击下方按钮，该按钮将以开发容器模式打开VSCode并自动克隆GitHub仓库。 [![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/cscrs/kate-cd.git) ### CodeOcean访问方式若您希望通过[开发容器（Development Containers）](https://containers.dev/)在VSCode中打开胶囊，首先需要下载胶囊。下载途径有两种：通过胶囊导出，或从Git仓库克隆。我们推荐使用Git克隆方式，可选择CodeOcean或GitHub仓库（二者内容完全一致）。 ~~~bash # CodeOcean Git仓库 # CodeOcean中存在两个Git仓库： # (1) 已发布胶囊的仓库 (https://git.codeocean.com/capsule-9061546.git) # (2) 原始胶囊的仓库 (https://git.codeocean.com/capsule-3747729.git) # 此处我们使用原始胶囊的Git仓库 $ git clone https://git.codeocean.com/capsule-3747729.git ~~~ 或 ~~~bash # GitHub Git仓库 $ git clone https://github.com/cscrs/kate-cd.git ~~~ 下一步操作是打开VSCode，选择“打开远程窗口”，随后选择“在容器中打开文件夹”选项，导航至您克隆的Git文件夹，VSCode将自动构建Docker容器并加载胶囊内容。 ### HuggingFace访问方式 [HuggingFace](https://huggingface.co/datasets/cscrs/kate-cd)用于托管数据库并为开发者提供可视化访问界面。HuggingFace采用Parquet格式存储数据库内容，并使用内部Git仓库的大文件存储（LFS）功能。因此，我们将HuggingFace的Git仓库与GitHub、CodeOcean的仓库进行了隔离，仅存储数据库（Parquet格式）与README文件。 HuggingFace仓库中的Parquet文件可通过以下命令更新： $ cd code $ python utils/hf_update_db.py ### 相对路径与绝对路径本代码中使用相对路径定位数据文件，以保障不同工作环境下的兼容性。如此一来，无论您在[开放科学库（Open Science Library）](https://codeocean.com/explore/82765786-a936-438c-a75a-84e2817294c5)平台，还是本地开发环境中运行胶囊，均可直接复用相同的代码与数据结构，无需修改任何内容。使用相对路径方案的唯一要求是，需在**code**文件夹内运行Python代码，例如： ~~~bash $ cd code $ python predictions.py ~~~ 该路径规则也适配CodeOcean平台的胶囊运行方式。 ### 复现结果文件夹若您访问[开放科学库（Open Science Library）](https://codeocean.com/explore/82765786-a936-438c-a75a-84e2817294c5)，即可看到已发布的结果始终存储于**results**文件夹中。这是CodeOcean平台用于存储PDF、PNG或普通文本输出结果的专用文件夹，因此在CodeOcean胶囊的Git结构中不会包含该文件夹。当您克隆或导出CodeOcean胶囊时，将不会看到该文件夹。每当您生成新的输出结果时，都应创建**results**文件夹并将输出文件存放其中。出于同样的原因，请勿将该文件夹纳入Git版本控制。

提供机构：

damlacetinkayaa

5,000+

优质数据集

54 个

任务类型

进入经典数据集