Tongyi-ConvAI/MMEvol
收藏Hugging Face2024-11-30 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Tongyi-ConvAI/MMEvol
下载链接
链接失效反馈官方服务:
资源简介:
MMEvol-480K数据集是由Tongyi-ConvAI生成的多模态监督微调数据集,用于训练Evol-Llama3-8B-Instruct和Evol-Qwen2-7B模型。数据集通过迭代过程增强数据质量,包括细粒度感知、认知推理和交互进化,生成更复杂和多样化的图像-文本指令数据。数据集包含163K种子指令调优数据集,并通过三种进化方向(细粒度感知进化、交互进化和认知推理进化)生成高质量指令数据。数据集还包含从开源多模态数据集中收集的图像,并提供用于模型训练的最终数据文件mix_evol_sft.json。
The MMEvol-480K dataset is a multi-modal supervised fine-tuning dataset generated by Tongyi-ConvAI, used to train the Evol-Llama3-8B-Instruct and Evol-Qwen2-7B models. The dataset enhances data quality through an iterative process that includes fine-grained perception, cognitive reasoning, and interaction evolution, resulting in more complex and diverse image-text instruction data. The dataset contains 163K seed instruction tuning datasets and generates high-quality instruction data through three evolution directions (fine-grained perceptual evolution, interactive evolution, and cognitive reasoning evolution). The dataset also includes images collected from open-source multimodal datasets and provides the final data file mix_evol_sft.json for model training.
提供机构:
Tongyi-ConvAI



