five

XTD-10

收藏
魔搭社区2025-12-05 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/XTD-10
下载链接
链接失效反馈
官方服务:
资源简介:
# XTD Multimodal Multilingual Data With Instruction This dataset contains datasets (**with English instruction**) used for evaluating the multilingual capability of a multimodal embedding model, including seven languages: - **it**, **es**, **ru**, **zh**, **pl**, **tr**, **ko** ## Dataset Usage - The instruction on the query side is: "Retrieve an image of this caption." - The instruction on the document side is: "Represent the given image." - Each example contains a query and a set of targets. The first one in the candidate list is the groundtruth target. ## Image Preparation First, you should prepare the images used for evaluation: ### Image Downloads [**XTD10 images**](https://huggingface.co/datasets/Haon-Chen/XTD-10/resolve/main/XTD10_dataset.tar.gz) ``` mkdir -p images && cd images wget https://huggingface.co/datasets/Haon-Chen/XTD-10/resolve/main/XTD10_dataset.tar.gz tar -I "pigz -d -p 8" -xf XTD10_dataset.tar.gz ``` ### Image Organization ``` images/ ├── XTD10_dataset/ └── ... .jpg ``` You can refer to the image paths in each subset to view the image organization. You can also customize your image paths by altering the image_path fields. ## Citation If you use this dataset in your research, feel free to cite the original paper of XTD and the mmE5 paper. [mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data](https://huggingface.co/papers/2502.08468) ``` @article{chen2025mmE5, title={mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data}, author={Chen, Haonan and Wang, Liang and Yang, Nan and Zhu, Yutao and Zhao, Ziliang and Wei, Furu and Dou, Zhicheng}, journal={arXiv preprint arXiv:2502.08468}, year={2025} } @article{XTD, author = {Pranav Aggarwal and Ajinkya Kale}, title = {Towards Zero-shot Cross-lingual Image Retrieval}, journal = {CoRR}, volume = {abs/2012.05107}, year = {2020}, url = {https://arxiv.org/abs/2012.05107}, eprinttype = {arXiv}, eprint = {2012.05107}, timestamp = {Sat, 02 Jan 2021 15:43:30 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2012-05107.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} } ```

# 带指令的XTD多模态多语言数据集 本数据集包含用于评估多模态嵌入模型(multimodal embedding model)多语言能力的数据集(附带英文指令),涵盖7种语言:意大利语(it)、西班牙语(es)、俄语(ru)、中文(zh)、波兰语(pl)、土耳其语(tr)、韩语(ko)。 ## 数据集使用说明 - 查询端的指令为:"检索与该描述匹配的图像。" - 文档端的指令为:"对给定图像进行表征。" - 每个样本均包含一条查询与一组目标样本,候选列表中的首个目标即为基准真值目标(groundtruth target)。 ## 图像准备流程 首先,请准备用于评估的图像: ### 图像下载 [**XTD10图像集**](https://huggingface.co/datasets/Haon-Chen/XTD-10/resolve/main/XTD10_dataset.tar.gz) mkdir -p images && cd images wget https://huggingface.co/datasets/Haon-Chen/XTD-10/resolve/main/XTD10_dataset.tar.gz tar -I "pigz -d -p 8" -xf XTD10_dataset.tar.gz ### 图像组织规范 images/ ├── XTD10_dataset/ └── ... .jpg 您可参考各子集中的图像路径,了解具体的图像组织方式。 您也可通过修改`image_path`字段来自定义图像路径。 ## 引用说明 若您在研究中使用本数据集,请引用XTD的原始论文与mmE5论文。 [mmE5: 通过高质量合成数据优化多模态多语言嵌入模型](https://huggingface.co/papers/2502.08468) @article{chen2025mmE5, title={mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data}, author={Chen, Haonan and Wang, Liang and Yang, Nan and Zhu, Yutao and Zhao, Ziliang and Wei, Furu and Dou, Zhicheng}, journal={arXiv preprint arXiv:2502.08468}, year={2025} } @article{XTD, author = {Pranav Aggarwal and Ajinkya Kale}, title = {Towards Zero-shot Cross-lingual Image Retrieval}, journal = {CoRR}, volume = {abs/2012.05107}, year = {2020}, url = {https://arxiv.org/abs/2012.05107}, eprinttype = {arXiv}, eprint = {2012.05107}, timestamp = {Sat, 02 Jan 2021 15:43:30 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2012.05107.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
提供机构:
maas
创建时间:
2025-02-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作