five

SUSTech/Mars-VL-Pairs

收藏
Hugging Face2026-02-17 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/SUSTech/Mars-VL-Pairs
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-4.0 size_categories: - 1K<n<10K task_categories: - image-to-text - text-to-image dataset_info: features: - name: key dtype: string - name: image_url dtype: string - name: ori_caption dtype: string - name: refined_caption dtype: string splits: - name: train num_bytes: 893082 num_examples: 2287 download_size: 485670 dataset_size: 893082 configs: - config_name: default data_files: - split: train path: data/train-* tags: - Multimodal - Retrieval - planet --- # Paired Image–Text Retrieval [**Paper**](https://huggingface.co/papers/2602.13961) | [**Code**](https://github.com/ml-stat-Sustech/MarsRetrieval) ## Dataset Summary This dataset is Task 1 of [**MarsRetrieval**](https://github.com/ml-stat-Sustech/MarsRetrieval), a retrieval-centric benchmark for evaluating vision-language models (VLMs) on Mars geospatial discovery. Task 1 evaluates **fine-grained image–text alignment** using curated Martian image–text pairs spanning multiple spatial scales, from global orbital mosaics to rover-level imagery. The dataset contains **2,287** paired image–text samples. For details about dataset construction and evaluation protocol, please refer to the [official repository](https://github.com/ml-stat-Sustech/MarsRetrieval/blob/main/docs/DATASET.md). ## Task Formulation We formulate this task as a **bidirectional one-to-one retrieval** benchmark: - **Text → Image** retrieval - **Image → Text** retrieval ### Metrics We report standard retrieval metrics: - Recall@1 (R@1) - Recall@10 (R@10) - Mean Reciprocal Rank (MRR) - Median Rank (MedR) ## How to Use ```python from datasets import load_dataset # Load the dataset dataset = load_dataset("SUSTech/Mars-VL-Pairs") # Access a sample print(dataset["train"][0]["refined_caption"]) ``` For detailed instructions on the retrieval-centric protocol and official evaluation scripts, please refer to our [Official Dataset Documentation](https://github.com/ml-stat-Sustech/MarsRetrieval/blob/main/docs/DATASET.md). ## Citation If you find this useful in your research, please consider citing: ```bibtex @article{wang2026marsretrieval, title={MarsRetrieval: Benchmarking Vision-Language Models for Planetary-Scale Geospatial Retrieval on Mars}, author={Wang, Shuoyuan and Wang, Yiran and Wei, Hongxin}, journal={arXiv preprint arXiv:2602.13961}, year={2026} } ```
提供机构:
SUSTech
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作