five

jfang/CraterBench-R

收藏
Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/jfang/CraterBench-R
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 size_categories: - 10K<n<100K task_categories: - image-to-image - image-feature-extraction pretty_name: CraterBench-R tags: - mars - crater - retrieval - remote-sensing - planetary-science - vision-transformer --- # CraterBench-R [Paper](https://huggingface.co/papers/2604.06245) CraterBench-R is an instance-level crater retrieval benchmark built from Mars CTX imagery. This repository contains the released benchmark images, official split files, relevance metadata, and a minimal evaluation example. ## Summary - `25,000` crater identities in the benchmark gallery - `50,000` gallery images with two canonical context crops per crater - `1,000` held-out query crater identities - `5,000` manually verified query images with five views per query crater - official `train` and `test` split manifests with relative paths only ## Download ```bash # Download the repository pip install huggingface_hub huggingface-cli download jfang/CraterBench-R --repo-type dataset --local-dir CraterBench-R cd CraterBench-R # Extract images unzip images.zip ``` After extraction the directory should contain `images/gallery/` (50,000 JPEGs) and `images/query/` (5,000 JPEGs). ## Repository Layout - `images.zip`: all benchmark images (gallery + query) in a single archive - `splits/test.json`: official benchmark split with full gallery plus query set - `splits/train.json`: train gallery with the full test relevance closure removed - `metadata/query_relevance.json`: raw co-visible crater IDs and gallery-filtered relevance - `metadata/stats.json`: release summary statistics - `metadata/source/retrieval_ground_truth_raw.csv`: raw query relevance CSV for reference - `examples/eval_timm_global.py`: minimal global-descriptor example for any `timm` image model - `requirements.txt`: lightweight requirements for the example script ## Split Semantics `splits/test.json` is the official benchmark split and includes both: - the full gallery - the query set evaluated against that gallery The `ground_truth` field maps each query crater ID to gallery-present relevant crater IDs. The raw query co-visibility information is preserved separately in `metadata/query_relevance.json`: - `co_visible_ids_all`: all raw co-visible crater IDs from the source annotation - `relevant_gallery_ids`: the subset that is present in the released gallery `splits/train.json` is intended for supervised or metric-learning experiments. It excludes the full test relevance closure, not just the 1,000 direct query crater IDs. ## Official Counts - test gallery: `25,000` crater IDs / `50,000` images - test queries: `1,000` crater IDs / `5,000` images - train gallery: `23,941` crater IDs / `47,882` images - raw multi-ID query crater IDs: `428` - gallery-present multi-ID query crater IDs: `59` ## Quick Start ```bash pip install -r requirements.txt unzip images.zip # if not already extracted python examples/eval_timm_global.py \ --data-root . \ --split test \ --model vit_small_patch16_224.dino \ --pool cls \ --batch-size 64 \ --device cuda ``` The example script: - loads `splits/test.json` - creates a pretrained `timm` model - extracts one feature vector per image - performs cosine retrieval against the released gallery - reports Recall@1/5/10, mAP, and MRR It is intentionally simple and meant as a working baseline rather than the fastest possible evaluator. ## Manifest Format Each split JSON has the form: ```json { "split_name": "train or test", "version": "release_v1", "gallery_images": [ { "image_id": "02-1-002611_2x", "path": "images/gallery/02-1-002611_2x.jpg", "crater_id": "02-1-002611", "view_type": "2x" } ], "query_images": [ { "image_id": "02-1-002927_view_1", "query_id": "02-1-002927", "path": "images/query/02-1-002927_view_1.jpg", "crater_id": "02-1-002927", "view": 1, "manual_verified": true } ], "ground_truth": { "02-1-002927": ["02-1-002927"] } } ``` ## Validation The repository includes one small validation utility: - `scripts/validate_release.py`: checks split consistency, relative paths, and train/test separation To validate the package locally: ```bash python scripts/validate_release.py ``` ## Notes - Image paths in manifests are relative and portable. - `splits/test.json` is the official benchmark entrypoint. - `splits/train.json` is intended for training or fine-tuning experiments.
提供机构:
jfang
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作