XTD-10

Name: XTD-10
Creator: maas
Published: 2025-12-05 16:23:20
License: 暂无描述

魔搭社区2025-12-05 更新2025-02-15 收录

下载链接：

https://modelscope.cn/datasets/AI-ModelScope/XTD-10

下载链接

链接失效反馈

官方服务：

资源简介：

# XTD Multimodal Multilingual Data With Instruction This dataset contains datasets (**with English instruction**) used for evaluating the multilingual capability of a multimodal embedding model, including seven languages: - **it**, **es**, **ru**, **zh**, **pl**, **tr**, **ko** ## Dataset Usage - The instruction on the query side is: "Retrieve an image of this caption." - The instruction on the document side is: "Represent the given image." - Each example contains a query and a set of targets. The first one in the candidate list is the groundtruth target. ## Image Preparation First, you should prepare the images used for evaluation: ### Image Downloads [**XTD10 images**](https://huggingface.co/datasets/Haon-Chen/XTD-10/resolve/main/XTD10_dataset.tar.gz) ``` mkdir -p images && cd images wget https://huggingface.co/datasets/Haon-Chen/XTD-10/resolve/main/XTD10_dataset.tar.gz tar -I "pigz -d -p 8" -xf XTD10_dataset.tar.gz ``` ### Image Organization ``` images/ ├── XTD10_dataset/ └── ... .jpg ``` You can refer to the image paths in each subset to view the image organization. You can also customize your image paths by altering the image_path fields. ## Citation If you use this dataset in your research, feel free to cite the original paper of XTD and the mmE5 paper. [mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data](https://huggingface.co/papers/2502.08468) ``` @article{chen2025mmE5, title={mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data}, author={Chen, Haonan and Wang, Liang and Yang, Nan and Zhu, Yutao and Zhao, Ziliang and Wei, Furu and Dou, Zhicheng}, journal={arXiv preprint arXiv:2502.08468}, year={2025} } @article{XTD, author = {Pranav Aggarwal and Ajinkya Kale}, title = {Towards Zero-shot Cross-lingual Image Retrieval}, journal = {CoRR}, volume = {abs/2012.05107}, year = {2020}, url = {https://arxiv.org/abs/2012.05107}, eprinttype = {arXiv}, eprint = {2012.05107}, timestamp = {Sat, 02 Jan 2021 15:43:30 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2012-05107.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} } ```

# 带指令的XTD多模态多语言数据集本数据集包含用于评估多模态嵌入模型（multimodal embedding model）多语言能力的数据集（附带英文指令），涵盖7种语言：意大利语（it）、西班牙语（es）、俄语（ru）、中文（zh）、波兰语（pl）、土耳其语（tr）、韩语（ko）。 ## 数据集使用说明 - 查询端的指令为："检索与该描述匹配的图像。" - 文档端的指令为："对给定图像进行表征。" - 每个样本均包含一条查询与一组目标样本，候选列表中的首个目标即为基准真值目标（groundtruth target）。 ## 图像准备流程首先，请准备用于评估的图像： ### 图像下载 [**XTD10图像集**](https://huggingface.co/datasets/Haon-Chen/XTD-10/resolve/main/XTD10_dataset.tar.gz) mkdir -p images && cd images wget https://huggingface.co/datasets/Haon-Chen/XTD-10/resolve/main/XTD10_dataset.tar.gz tar -I "pigz -d -p 8" -xf XTD10_dataset.tar.gz ### 图像组织规范 images/ ├── XTD10_dataset/ └── ... .jpg 您可参考各子集中的图像路径，了解具体的图像组织方式。您也可通过修改`image_path`字段来自定义图像路径。 ## 引用说明若您在研究中使用本数据集，请引用XTD的原始论文与mmE5论文。 [mmE5: 通过高质量合成数据优化多模态多语言嵌入模型](https://huggingface.co/papers/2502.08468) @article{chen2025mmE5, title={mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data}, author={Chen, Haonan and Wang, Liang and Yang, Nan and Zhu, Yutao and Zhao, Ziliang and Wei, Furu and Dou, Zhicheng}, journal={arXiv preprint arXiv:2502.08468}, year={2025} } @article{XTD, author = {Pranav Aggarwal and Ajinkya Kale}, title = {Towards Zero-shot Cross-lingual Image Retrieval}, journal = {CoRR}, volume = {abs/2012.05107}, year = {2020}, url = {https://arxiv.org/abs/2012.05107}, eprinttype = {arXiv}, eprint = {2012.05107}, timestamp = {Sat, 02 Jan 2021 15:43:30 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2012.05107.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }

提供机构：

maas

创建时间：

2025-02-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集