Video-T3-QA
收藏魔搭社区2025-12-10 更新2025-03-01 收录
下载链接:
https://modelscope.cn/datasets/MMInstruction/Video-T3-QA
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---
Textual Temporal Understanding Dataset
Temporal Reasoning Transfer from Text to Video, ICLR 2025
Project Page: https://video-t3.github.io/
In each json file, we provide LLaVA-style text QA samples, using the synthesization method described in our paper.
For example:
```json
[
{
"from": "human",
"value": "Based on the following captions describing keyframes of a video, answer the next question.\n\nCaptions:\nThe image displays a circular emblem with a metallic appearance, conveying a sense of authority and power, suggesting it could be a seal or a logo.\nThe image displays a circular emblem with a metallic appearance, conveying a sense of elegance and sophistication, suggesting it could be a seal or a logo.\nQuestion: How does the conveyed sense of the emblem change in the video?\n(A) from elegance and sophistication to authority and power\n(B) from simplicity and modernity to complexity and tradition\n(C) from authority and power to elegance and sophistication\n(D) from complexity and tradition to simplicity and modernity\n\nProvide only the top choice:\n"
},
{
"from": "gpt",
"value": "(C) from authority and power to elegance and sophistication"
}
]
```
You can adapt the sample to your training codebase for enhance the temporal understsanding ability of Video-LLMs.
Mixing the dataset with other image-text SFT samples would help mitigate potential forgetting issues.
The number of samples could be easily scaled up following the method described in Sec. 3 of the paper.
| Dataset | #Relevant Captions | #Distractor Captions | Description |
|---------|-------------------|---------------------|-------------|
| Order-GPT (N×) | 2~4 | N × 100 ± 50, N ∈ {1, 2, 4, 8} | Order-related questions generated by GPT-4. |
| Attribute (N×) | 2 | N × 100 ± 50, N ∈ {1, 2, 4, 8} | Attribute-related questions. |
| Order-Template (X) | 3~6 | 200±50 | Order-related questions based on templates X |
| Referring | 3 | 200±50 | Temporal referring questions. |
| Grounding | 3 | 200±50 | Temporal grounding questions. |
Mapping for the dataset with json files:
- Order-GPT: `order_train`
- Attribute: `attribute_train`
- Order-Template: `shuffle_phrase`, `shuffle_sentence`, `shuffle_prefix`
- Referring: `refer_begin_end_temp2any`
- Grounding: `refer_begin_end_any2temp`
## Citation
If you found this dataset to be helpful, please kindly cite our paper:
```bibtex
@inproceedings{li2025videot3,
author={Li, Lei and Liu, Yuanxin and Yao, Linli and Zhang, Peiyuan and An, Chenxin and Wang, Lean and Sun, Xu and Kong, Lingpeng and Liu, Qi},
title={Temporal Reasoning Transfer from Text to Video},
booktitle = {ICLR 2025},
publisher = {OpenReview.net},
year = {2025},
url = {https://openreview.net/forum?id=sHAvMp5J4R}
}
```
许可证:Apache-2.0
# 文本时序理解数据集
面向文本到视频的时序推理迁移,ICLR 2025
项目主页:https://video-t3.github.io/
每个JSON文件中均提供遵循LLaVA格式的文本问答样本,采用论文中所述的合成方法生成。示例如下:
json
[
{
"from": "human",
"value": "请根据以下描述视频关键帧的字幕,回答后续问题。
字幕:
该图像展示了一个带有金属质感的圆形徽章,传递出权威与力量感,暗示其可能是一枚印章或标识。
该图像展示了一个带有金属质感的圆形徽章,传递出优雅精致感,暗示其可能是一枚印章或标识。
问题:该徽章在视频中传递的观感发生了怎样的变化?
(A) 从优雅精致感转变为权威与力量感
(B) 从简洁现代感转变为繁复传统感
(C) 从权威与力量感转变为优雅精致感
(D) 从繁复传统感转变为简洁现代感
仅请给出最优选项:
"
},
{
"from": "gpt",
"value": "(C) 从权威与力量感转变为优雅精致感"
}
]
您可将该样本适配至您的训练代码库中,以提升视频大语言模型(Video-LLMs)的时序理解能力。
将该数据集与其他图像-文本监督微调(Supervised Fine-Tuning,SFT)样本混合训练,有助于缓解潜在的遗忘问题。
可按照论文第3节所述方法,轻松扩充样本数量。
| 数据集 | 相关字幕数量 | 干扰字幕数量 | 描述 |
|---------|-------------------|---------------------|-------------|
| Order-GPT (N×) | 2~4 | N × 100 ± 50, N ∈ {1, 2, 4, 8} | 由GPT-4生成的时序相关问题。 |
| 属性 (N×) | 2 | N × 100 ± 50, N ∈ {1, 2, 4, 8} | 属性相关问题。 |
| 时序模板 (X) | 3~6 | 200±50 | 基于模板X生成的时序相关问题。 |
| 时序指代 | 3 | 200±50 | 时序指代类问题。 |
| 时序定位 | 3 | 200±50 | 时序定位类问题。 |
## 数据集与JSON文件的对应关系
- Order-GPT 对应 `order_train`
- 属性数据集 对应 `attribute_train`
- 时序模板数据集 对应 `shuffle_phrase`、`shuffle_sentence`、`shuffle_prefix`
- 时序指代数据集 对应 `refer_begin_end_temp2any`
- 时序定位数据集 对应 `refer_begin_end_any2temp`
## 引用说明
若该数据集对您的研究有所帮助,请引用我们的论文:
bibtex
@inproceedings{li2025videot3,
author={Li, Lei and Liu, Yuanxin and Yao, Linli and Zhang, Peiyuan and An, Chenxin and Wang, Lean and Sun, Xu and Kong, Lingpeng and Liu, Qi},
title={Temporal Reasoning Transfer from Text to Video},
booktitle = {ICLR 2025},
publisher = {OpenReview.net},
year = {2025},
url = {https://openreview.net/forum?id=sHAvMp5J4R}
}
提供机构:
maas
创建时间:
2025-02-26



