Context-as-Memory-Dataset
收藏魔搭社区2025-11-27 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/KwaiVGI/Context-as-Memory-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
<div align="center">
<h1>Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval</h1>
<h1>SIGGRAPH Asia 2025</h1>
<p>
<a href="https://context-as-memory.github.io/">[Project page]</a>
<a href="https://arxiv.org/pdf/2506.03141">[ArXiv]</a>
<a href="https://huggingface.co/datasets/KwaiVGI/Context-as-Memory-Dataset">[Dataset]</a>
</p>
</div>
# File Structure
To prepare the dataset for use, merge the parts into a single zip file using the following command:
```bash
cat Context-as-Memory-Dataset_* > Context-as-Memory-Dataset.zip
```
After extracting `Context-as-Memory-Dataset.zip`, the dataset will be organized as follows:
```
Context-as-Memory-Dataset
├── frames
│ ├── AncientTempleEnv_0
│ │ ├── 0000.png
│ │ ├── 0001.png
│ │ ├── 0002.png
│ │ └── ...
│ ├── AncientTempleEnv_1
│ │ ├── 0000.png
│ │ ├── 0001.png
│ │ ├── 0002.png
│ │ └── ...
│ └── ...
│
├── jsons
│ ├── AncientTempleEnv_0.json
│ ├── AncientTempleEnv_1.json
│ └── ...
│
├── overlap_labels
│ ├── AncientTempleEnv_0
│ │ ├── 0.json
│ │ ├── 1.json
│ │ ├── 2.json
│ │ └── ...
│ ├── AncientTempleEnv_1
│ │ ├── 0.json
│ │ ├── 1.json
│ │ ├── 2.json
│ │ └── ...
│ └── ...
│
└── captions.txt
```
# Explanation of Dataset Parts
- **`frames/`**: 100 subdirectories, each containing 7,601 video frame images.
- **`jsons/`**: 100 JSON files, each storing the camera pose (position + rotation) of every frame in the corresponding long video.
- **`overlap_labels/`**: 100 subdirectories, each containing 7,601 JSON files, where each file records the indices of overlapping frames corresponding to that frame.
- **`captions.txt`**: Captions annotated for a segment of a long video, from a given starting frame to an ending frame.
- We also provide a simple code file, `tools.py`, which can convert (x, y, z, yaw, pitch) into RT, and can also select a specific frame as the reference frame to align the RT of other frames to its coordinate system.
<div align="center">
<h1>以上下文为记忆:结合记忆检索的场景一致性交互式长视频生成</h1>
<h1>SIGGRAPH Asia 2025</h1>
<p>
<a href="https://context-as-memory.github.io/">[项目主页]</a>
<a href="https://arxiv.org/pdf/2506.03141">[ArXiv预印本]</a>
<a href="https://huggingface.co/datasets/KwaiVGI/Context-as-Memory-Dataset">[数据集]</a>
</p>
</div>
# 文件结构
如需使用该数据集,请通过以下命令将分卷文件合并为单个压缩包:
bash
cat Context-as-Memory-Dataset_* > Context-as-Memory-Dataset.zip
解压`Context-as-Memory-Dataset.zip`后,数据集的目录结构如下:
Context-as-Memory-Dataset
├── frames
│ ├── AncientTempleEnv_0
│ │ ├── 0000.png
│ │ ├── 0001.png
│ │ ├── 0002.png
│ │ └── ...
│ ├── AncientTempleEnv_1
│ │ ├── 0000.png
│ │ ├── 0001.png
│ │ ├── 0002.png
│ │ └── ...
│ └── ...
│
├── jsons
│ ├── AncientTempleEnv_0.json
│ ├── AncientTempleEnv_1.json
│ └── ...
│
├── overlap_labels
│ ├── AncientTempleEnv_0
│ │ ├── 0.json
│ │ ├── 1.json
│ │ ├── 2.json
│ │ └── ...
│ ├── AncientTempleEnv_1
│ │ ├── 0.json
│ │ ├── 1.json
│ │ ├── 2.json
│ │ └── ...
│ └── ...
│
└── captions.txt
# 数据集各部分说明
- **`frames/`**:包含100个子目录,每个子目录内存储7601张视频帧图像。
- **`jsons/`**:包含100个JSON文件,每个文件存储对应长视频中每一帧的相机位姿(位置+旋转参数)。
- **`overlap_labels/`**:包含100个子目录,每个子目录内包含7601个JSON文件,每个文件记录当前帧对应的重叠帧索引。
- **`captions.txt`**:针对长视频片段(从指定起始帧至结束帧)标注的字幕文本。
- 此外还提供了简易代码文件`tools.py`,可将(x, y, z, 偏航角, 俯仰角)转换为RT矩阵,同时支持选取指定帧作为参考帧,将其他帧的RT矩阵对齐至该参考帧的坐标系下。
提供机构:
maas
创建时间:
2025-10-09
搜集汇总
数据集介绍

背景与挑战
背景概述
Context-as-Memory-Dataset是一个用于场景一致交互式长视频生成的数据集,包含100个子目录的视频帧图像、相机姿态JSON文件、重叠帧标签JSON文件和视频片段标注。数据集总大小为339.72GB,适用于视频生成和场景理解相关研究。
以上内容由遇见数据集搜集并总结生成



