LLaVA-Instruct-ru
收藏魔搭社区2025-12-05 更新2025-08-02 收录
下载链接:
https://modelscope.cn/datasets/deepvk/LLaVA-Instruct-ru
下载链接
链接失效反馈官方服务:
资源简介:
# LLaVA-Instruct-ru
Dataset similar to LLaVA instruct, but in Russian.
We follow the original pipeline to generate data and collect instruction with `conversation` and `complex_reasoning` types.
For more details, see original [paper](https://arxiv.org/abs/2304.08485).
Each row has 4 fields:
- `type`: `conversation` or `complex_reasoning`
- `conversations`: a list of dictionaries with utterances, each utterance contains `from` and `value` keys.
- `id`: image identifier in COCO, not unique
- `image`: path to the image in COCO dataset
Each `conversation` dialog contains several utterances. The human asks questions and clarifies previous answers, GPT responds to questions. Utterances are ordered, meaning that in the next utterance, the human or GPT may refer to previous ones.
In most dialogs, in the last utterance, GPT attempts to reason on the topic set by the human.
Each `complex_reasoning` dialog contains one question from the human and a detailed answer with reasoning from GPT.
See [`conversation_prompt.json`](./conversation_prompt.json) and [`complex_reasoning_prompt.json`](./complex_reasoning_prompt.json) for prompting details, including few-shot examples.
Train/val split corresponds to the train/val split in COCO 2014.
In train, all first utterances start with the `<image>` tag according to the original dataset; train remains unchanged and can be used with the LLaVA repository.
In val, there is no `<image>` tag.
The data was obtained using `gpt-3.5-turbo-0125`.
All utterances are written by the model, including human utterances.
The division of utterances into human and GPT is necessary for data usage and is not related to the procedure of obtaining them.
Filtering of the obtained data was carried out in 3 stages:
1. Removal of all rows that do not correspond to the dialog structure.
2. Removal of all utterances containing spelling errors. Subsequent utterances were also removed, as well as the human utterance if the following GPT utterance was removed.
3. Removal of rows with frequent errors, for the detection of which a heuristic was devised.
## Citation
```
@misc{liu2023llava,
title={Visual Instruction Tuning},
author={Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae},
publisher={NeurIPS},
year={2023},
}
```
```
@misc{deepvk2024llava_instruct_ru,
title={LLaVA-Instruct-ru},
author={Belopolskih, Daniil and Spirin, Egor},
url={https://huggingface.co/datasets/deepvk/LLaVA-Instruct-ru/},
publisher={Hugging Face}
year={2024},
}
```
# LLaVA-Instruct-ru
本数据集仿照LLaVA指令数据集(LLaVA instruct)构建,仅采用俄语编写。我们沿用原始数据生成流程,收集了两类指令:对话型(conversation)与复杂推理型(complex_reasoning)。如需了解更多细节,请参阅原始论文[https://arxiv.org/abs/2304.08485]。
每条数据包含4个字段:
- `type`:取值为`conversation`(对话型)或`complex_reasoning`(复杂推理型)
- `conversations`:由若干对话轮次字典组成的列表,每个对话轮次字典包含`from`与`value`两个键。
- `id`:COCO数据集的图像标识符,不具备唯一性。
- `image`:COCO数据集中对应图像的文件路径。
每段`conversation`(对话)包含多轮对话。其中人类用户发起提问并澄清过往回复,GPT模型负责作答。对话轮次存在严格顺序,后续轮次的人类或GPT发言可引用此前的对话内容。在多数对话中,GPT会在最后一轮发言中针对人类提出的主题展开推理。
每段`complex_reasoning`(复杂推理)对话仅包含一轮人类提问,以及GPT给出的附带详细推理过程的回复。如需了解提示词工程细节(包含少样本示例),请参阅`conversation_prompt.json`与`complex_reasoning_prompt.json`文件。
训练集/验证集划分与COCO 2014数据集的划分规则保持一致。训练集的首轮发言均按照原始数据集格式以`<image>`标签开头;训练集未做修改,可直接在LLaVA代码仓库中使用。验证集则不包含`<image>`标签。
本数据集通过`gpt-3.5-turbo-0125`模型生成。所有对话轮次(包括人类用户的发言)均由模型生成。将对话轮次划分为人类与GPT两类仅为方便数据集使用,与数据集的实际生成流程无关。
所获数据经过了三阶段过滤:
1. 移除所有不符合对话结构的数据行。
2. 移除所有包含拼写错误的对话轮次;若某轮发言被移除,则其后续所有轮次也一并删除;若GPT回复被移除,则对应的人类提问也需删除。
3. 移除存在高频错误的数据行,该类错误通过自定义启发式规则进行检测。
## 参考文献
@misc{liu2023llava,
title={视觉指令微调(Visual Instruction Tuning)},
author={Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae},
publisher={NeurIPS},
year={2023},
}
@misc{deepvk2024llava_instruct_ru,
title={LLaVA-Instruct-ru},
author={Belopolskih, Daniil and Spirin, Egor},
url={https://huggingface.co/datasets/deepvk/LLaVA-Instruct-ru/},
publisher={Hugging Face},
year={2024},
}
提供机构:
maas
创建时间:
2025-08-01



