XTD-10
收藏魔搭社区2025-12-05 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/XTD-10
下载链接
链接失效反馈官方服务:
资源简介:
# XTD Multimodal Multilingual Data With Instruction
This dataset contains datasets (**with English instruction**) used for evaluating the multilingual capability of a multimodal embedding model, including seven languages:
- **it**, **es**, **ru**, **zh**, **pl**, **tr**, **ko**
## Dataset Usage
- The instruction on the query side is: "Retrieve an image of this caption."
- The instruction on the document side is: "Represent the given image."
- Each example contains a query and a set of targets. The first one in the candidate list is the groundtruth target.
## Image Preparation
First, you should prepare the images used for evaluation:
### Image Downloads
[**XTD10 images**](https://huggingface.co/datasets/Haon-Chen/XTD-10/resolve/main/XTD10_dataset.tar.gz)
```
mkdir -p images && cd images
wget https://huggingface.co/datasets/Haon-Chen/XTD-10/resolve/main/XTD10_dataset.tar.gz
tar -I "pigz -d -p 8" -xf XTD10_dataset.tar.gz
```
### Image Organization
```
images/
├── XTD10_dataset/
└── ... .jpg
```
You can refer to the image paths in each subset to view the image organization.
You can also customize your image paths by altering the image_path fields.
## Citation
If you use this dataset in your research, feel free to cite the original paper of XTD and the mmE5 paper.
[mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data](https://huggingface.co/papers/2502.08468)
```
@article{chen2025mmE5,
title={mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data},
author={Chen, Haonan and Wang, Liang and Yang, Nan and Zhu, Yutao and Zhao, Ziliang and Wei, Furu and Dou, Zhicheng},
journal={arXiv preprint arXiv:2502.08468},
year={2025}
}
@article{XTD,
author = {Pranav Aggarwal and
Ajinkya Kale},
title = {Towards Zero-shot Cross-lingual Image Retrieval},
journal = {CoRR},
volume = {abs/2012.05107},
year = {2020},
url = {https://arxiv.org/abs/2012.05107},
eprinttype = {arXiv},
eprint = {2012.05107},
timestamp = {Sat, 02 Jan 2021 15:43:30 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2012-05107.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```
# 带指令的XTD多模态多语言数据集
本数据集包含用于评估多模态嵌入模型(multimodal embedding model)多语言能力的数据集(附带英文指令),涵盖7种语言:意大利语(it)、西班牙语(es)、俄语(ru)、中文(zh)、波兰语(pl)、土耳其语(tr)、韩语(ko)。
## 数据集使用说明
- 查询端的指令为:"检索与该描述匹配的图像。"
- 文档端的指令为:"对给定图像进行表征。"
- 每个样本均包含一条查询与一组目标样本,候选列表中的首个目标即为基准真值目标(groundtruth target)。
## 图像准备流程
首先,请准备用于评估的图像:
### 图像下载
[**XTD10图像集**](https://huggingface.co/datasets/Haon-Chen/XTD-10/resolve/main/XTD10_dataset.tar.gz)
mkdir -p images && cd images
wget https://huggingface.co/datasets/Haon-Chen/XTD-10/resolve/main/XTD10_dataset.tar.gz
tar -I "pigz -d -p 8" -xf XTD10_dataset.tar.gz
### 图像组织规范
images/
├── XTD10_dataset/
└── ... .jpg
您可参考各子集中的图像路径,了解具体的图像组织方式。
您也可通过修改`image_path`字段来自定义图像路径。
## 引用说明
若您在研究中使用本数据集,请引用XTD的原始论文与mmE5论文。
[mmE5: 通过高质量合成数据优化多模态多语言嵌入模型](https://huggingface.co/papers/2502.08468)
@article{chen2025mmE5,
title={mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data},
author={Chen, Haonan and Wang, Liang and Yang, Nan and Zhu, Yutao and Zhao, Ziliang and Wei, Furu and Dou, Zhicheng},
journal={arXiv preprint arXiv:2502.08468},
year={2025}
}
@article{XTD,
author = {Pranav Aggarwal and
Ajinkya Kale},
title = {Towards Zero-shot Cross-lingual Image Retrieval},
journal = {CoRR},
volume = {abs/2012.05107},
year = {2020},
url = {https://arxiv.org/abs/2012.05107},
eprinttype = {arXiv},
eprint = {2012.05107},
timestamp = {Sat, 02 Jan 2021 15:43:30 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2012.05107.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
提供机构:
maas
创建时间:
2025-02-12



