ngqtrung/raw_set_3372
收藏Hugging Face2025-12-08 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/ngqtrung/raw_set_3372
下载链接
链接失效反馈官方服务:
资源简介:
# Raw Video Dataset (3372 Videos)
This dataset contains 3372 videos with both single-modality and cross-modality question-answering annotations.
## Dataset Structure
```
train/
├── metadata.json # Combined metadata with all annotations
├── videos_part_001.tar # Video files (part 1)
├── videos_part_002.tar # Video files (part 2)
└── ... # Additional parts (~5GB each)
```
## Metadata Format
The `metadata.json` file contains:
```json
{
"video_id": {
"video_path": "video_id.mp4",
"single_modality": {
"vision_only": { "question": "...", "choices": {}, "correct_answer": "..." },
"vision_only_misleading": { ... },
"audio_only": { ... },
"audio_only_misleading": { ... }
},
"cross_modality": {
"task0": { "variant_type": "default", "question": "...", ... },
"task1": { "variant_type": "audio_misleading", ... },
"task2": { "variant_type": "visual_misleading", ... }
}
}
}
```
## Question Types
### Single Modality
- **vision_only**: Questions about visual content only
- **vision_only_misleading**: Vision questions with misleading visual information
- **audio_only**: Questions about audio content only
- **audio_only_misleading**: Audio questions with misleading audio information
### Cross Modality
- **default**: Questions requiring both audio and visual understanding
- **audio_misleading**: Cross-modal questions with misleading audio
- **visual_misleading**: Cross-modal questions with misleading visuals
## Options
- Questions include options A, B, C, D
- Option E: "Vision details are wrong" (for vision questions) or "Audio details are wrong" (for audio questions)
- Option F: "Audio details are wrong" (only for cross-modality questions)
## Usage
### Extract Videos
```bash
# Extract all tar files
for tar_file in train/videos_part_*.tar; do
tar -xf "$tar_file" -C videos/
done
```
### Load Metadata
```python
import json
with open('train/metadata.json', 'r') as f:
metadata = json.load(f)
# Access data for a video
video_id = "example_video_id"
video_data = metadata[video_id]
print(video_data['single_modality']['vision_only']['question'])
```
## Statistics
- **Total Videos**: 3372
- **Total Tar Files**: 18
- **Single Modality Questions**: 13488
- **Cross Modality Questions**: 10116
## Citation
If you use this dataset, please cite appropriately.
## License
Please check the original video sources for licensing information.
提供机构:
ngqtrung



