mmE5-synthetic
收藏魔搭社区2025-05-22 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/mmE5-synthetic
下载链接
链接失效反馈官方服务:
资源简介:
# mmE5 Synthetic Data
This dataset contains synthetic datasets used for the finetuning of mmE5 ([mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data](https://arxiv.org/abs/2502.08468)):
- **Classification**
- **Retrieval**
- **VQA**
[Github](https://github.com/haon-chen/mmE5)
## Image Preparation
First, you should prepare the images used for training:
### Image Downloads
- **Download Links**: Download image resources for dataset via the following link:
- [**LAOIN-images**](https://huggingface.co/datasets/Haon-Chen/mmE5-synthetic/blob/main/LAION_Synthetic.tar.gz)
### Image Organization
```
images/
├── laion/
└── ... .jpg
```
You can refer to the image paths in each subset to view the image organization.
You can also customize your image paths by altering the image_path fields.
## Citation
If you use this dataset in your research, please cite the associated paper.
```
@article{chen2025mmE5,
title={mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data},
author={Chen, Haonan and Wang, Liang and Yang, Nan and Zhu, Yutao and Zhao, Ziliang and Wei, Furu and Dou, Zhicheng},
journal={arXiv preprint arXiv:2502.08468},
year={2025}
}
```
# mmE5合成数据集
本数据集包含用于mmE5微调的合成数据集,相关论文为《mmE5:基于高质量合成数据优化多模态多语言嵌入》(https://arxiv.org/abs/2502.08468):
- **分类任务**
- **检索任务**
- **视觉问答(Visual Question Answering,VQA)**
[GitHub](https://github.com/haon-chen/mmE5)
## 图像准备
首先,请准备训练所需的图像数据:
### 图像下载
- **下载链接**:请通过以下链接获取数据集的图像资源:
- [**LAOIN图像集**](https://huggingface.co/datasets/Haon-Chen/mmE5-synthetic/blob/main/LAION_Synthetic.tar.gz)
### 图像组织
images/
├── laion/
└── ... .jpg
您可参考各子集内的图像路径,了解数据集的图像组织方式。您也可以通过修改`image_path`字段来自定义图像路径。
## 引用
若您在研究工作中使用本数据集,请引用相关论文。
@article{chen2025mmE5,
title={mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data},
author={Chen, Haonan and Wang, Liang and Yang, Nan and Zhu, Yutao and Zhao, Ziliang and Wei, Furu and Dou, Zhicheng},
journal={arXiv preprint arXiv:2502.08468},
year={2025}
}
提供机构:
maas
创建时间:
2025-02-13



