ReWatch
收藏魔搭社区2025-12-26 更新2025-10-04 收录
下载链接:
https://modelscope.cn/datasets/zcccccz/ReWatch
下载链接
链接失效反馈官方服务:
资源简介:
# ReWatch Dataset
Our ReWatch-R1 project page: https://rewatch-r1.github.io/
## Video Source
Our video sources come from other open-source data. You need to download the original video from the following links.
VideoEspresso: https://huggingface.co/datasets/hshjerry0315/VideoEspresso_train_video \
Video-R1: https://huggingface.co/datasets/Video-R1/Video-R1-data \
MiraData: https://huggingface.co/datasets/TencentARC/MiraData \
Vript: https://huggingface.co/datasets/Mutonix/Vript \
VideoMarathon: https://huggingface.co/datasets/jylins/videomarathon
When using our data, you need to prepend the root directory of the video source to each video path:
```python
import os
import json
video_dirs = {
"VideoEspresso": "/path/to/your.../VideoEspresso_train_video",
"Video-R1": "/path/to/your.../Video-R1",
"MiraData": "/path/to/your.../mira_data",
"Vript": "/path/to/your.../vript_long_videos",
"VideoMarathon": "/path/to/your.../VideoMarathon"
}
caption_file = "caption.json"
caption_data = json.load(open(caption_file, "r"))
for item in caption_data:
source_name = item["source"] # video source
video_dir = video_dirs[source_name] # The root directory corresponding to the video source
ral_video_path = item["video_path"] # Relative path
abs_video_path = os.path.join(video_dir, video_dir) # Absolute path
```
For videos from MiraData, we use video_id and clip_id to determine the path of a video. You can find the video_id and clip_id of each video in the official repository of MiraData. For example, in our dataset, a video path from MiraData is:
```python
video_id = "ubgB1yelZMY"
clip_id = "49850.19"
video_path_in_our_dataset = f"{video_id}/{clip_id}.mp4"
```
For videos from VideoMarathon, we use their id in the VideoMarathon dataset to determine the path of a video. You need to use this id to find the download path of the original video in the VideoMarathon dataset.
For example, in our dataset, a video path from VideoMarathon is:
```python
id = "videomarathon_03293135"
video_path_in_our_dataset = f"{id}.mp4"
```
## Data Format
### Caption
```json
{
"video_path": "Youcook2/merged/4/6nVIgasiUtw.mkv",
"duration": 227,
"caption": "[00:00-00:05] The video opens with a ...",
"caption_summary": "[00:00-00:05] The video introduces ...",
"source": "VideoEspresso"
}
```
### QA
We use "question_id" to uniquely determine a question, The format are ```{video_path}_{question_type}_multiple_choice```or```{video_path}_{question_type}_open_end```
#### Multiple Choice
```json
{
"question_id": "4Z4E0IBaaRg.mp4_numerical_reasoning_multiple_choice",
"multiple_choice": true,
"question_type": "numerical_reasoning",
"question": "By how many seconds does the time interval between the player's second and third deaths exceed the time interval between the player's first and second deaths?\nA: 5 seconds\nB: 2 seconds\nC: 7 seconds\nD: 1 second",
"question_without_options": "By how many seconds does the time interval between the player's second and third deaths exceed the time interval between the player's first and second deaths?",
"mc_options": {
"A": "5 seconds",
"B": "2 seconds",
"C": "7 seconds",
"D": "1 second"
},
"answer": "D",
"video_path": "4Z4E0IBaaRg.mp4",
"source": "Vript"
}
```
#### Open End
```json
{
"question_id": "4Z4E0IBaaRg.mp4_numerical_reasoning_open_end",
"multiple_choice": false,
"question_type": "numerical_reasoning",
"question": "By how many seconds does the time interval between the player's second and third deaths exceed the time interval between the player's first and second deaths?",
"answer": "1 second",
"video_path": "4Z4E0IBaaRg.mp4",
"source": "Vript"
}
```
### CoT
react_chain is the interaction trajectory between the agent and the video, and natural_chain is the CoT data we use for training.
```json
{
"question_id": "XD-Violence/2/Kingsman.The.Golden.Circle.2017__#02-12-27_02-14-03_label_A.mp4_object_recognition_open_end",
"react_chain": [
{
"type": "thought",
"number": 1,
"content": "The question asks about named significant characters present among the guests at the grand church ceremony, aside from Eggsy, Princess Tilde, and Harry, and a distinct article of clothing worn by each. To answer this, I first need to locate the grand church ceremony in the video and identify the guests. I will start by retrieving the segment where the grand church ceremony occurs."
},
{
"type": "action",
"number": 1,
"content": "segment_retrieval(focus_event=\"the grand church ceremony, specifically the wedding ceremony of Eggsy and Princess Tilde\")"
},
{
"type": "observation",
"number": 1,
"content": "start_time: 00:53 \nend_time: 01:36"
},
{
"type": "thought",
"number": 2,
"content": "I have identified the time segment for the grand church ceremony. Now, I need to identify the significant characters present among the guests, excluding Eggsy, Princess Tilde, and Harry. I will query the segment focusing on the guests to identify them and their clothing. Since the question asks for named characters, I will focus on recognizable faces that might be identified as specific characters from the franchise."
},
{
"type": "action",
"number": 2,
"content": "segment_query(start_time=\"00:53\", end_time=\"01:36\", focus_aspect=\"guests and their distinct clothing, particularly looking for recognizable characters\")"
},
{
"type": "observation",
"number": 2,
"content": "Between 00:53 and 01:36, the wedding guests include notable characters such as Agent Tequila (Channing Tatum), dressed in a denim jacket, and Agent Ginger (Halle Berry), wearing a black dress. The congregation is diverse and features other significant figures from the film's universe in distinct, recognizable attire."
},
{
"type": "thought",
"number": 3,
"content": "The previous observation already identified two named significant characters, Agent Tequila and Agent Ginger, and described a distinct article of clothing for each. This information directly answers the question."
},
{
"type": "answer",
"content": "The two other named significant characters present among the guests are Agent Tequila, who is dressed in a denim jacket, and Agent Ginger, who is wearing a black dress.",
"number": 4
}
],
"natural_chain": "To answer this, I first need to locate the grand church ceremony in the video and identify the guests. I will start by retrieving the segment where the grand church ceremony occurs. <action>I'll look for the segment focusing on the wedding ceremony of Eggsy and Princess Tilde.</action> <observation>I've identified the ceremony between 00:53 and 01:36.</observation> Now, I need to identify the significant characters present among the guests, excluding Eggsy, Princess Tilde, and Harry. I will query the segment focusing on the guests to identify them and their clothing. Since the question asks for named characters, I will focus on recognizable faces that might be identified as specific characters from the franchise. <action>I'll query the segment from 00:53 to 01:36, specifically looking for guests and their distinct clothing, particularly recognizable characters.</action> <observation>I see Agent Tequila, dressed in a denim jacket, and Agent Ginger, wearing a black dress.</observation> This information directly answers the question. <answer>The two other named significant characters present among the guests are Agent Tequila, who is dressed in a denim jacket, and Agent Ginger, who is wearing a black dress.</answer>"
}
```
# ReWatch 数据集(ReWatch Dataset)
我们的ReWatch-R1项目页面:https://rewatch-r1.github.io/
## 视频源
本数据集的视频素材均来自其他开源数据集,您需通过以下链接下载原始视频:
VideoEspresso:https://huggingface.co/datasets/hshjerry0315/VideoEspresso_train_video
Video-R1:https://huggingface.co/datasets/Video-R1/Video-R1-data
MiraData:https://huggingface.co/datasets/TencentARC/MiraData
Vript:https://huggingface.co/datasets/Mutonix/Vript
VideoMarathon:https://huggingface.co/datasets/jylins/videomarathon
使用本数据集时,您需将对应视频源的根目录拼接至每条视频路径前方:
python
import os
import json
# 定义各视频源的根目录
video_dirs = {
"VideoEspresso": "/path/to/your.../VideoEspresso_train_video",
"Video-R1": "/path/to/your.../Video-R1",
"MiraData": "/path/to/your.../mira_data",
"Vript": "/path/to/your.../vript_long_videos",
"VideoMarathon": "/path/to/your.../VideoMarathon"
}
caption_file = "caption.json"
caption_data = json.load(open(caption_file, "r"))
for item in caption_data:
source_name = item["source"] # 视频源名称
video_dir = video_dirs[source_name] # 该视频源对应的根目录
ral_video_path = item["video_path"] # 相对路径
abs_video_path = os.path.join(video_dir, video_dir) # 拼接得到绝对路径
针对来自MiraData的视频,我们通过`video_id`(视频ID)与`clip_id`(片段ID)来确定视频路径。您可在MiraData的官方仓库中获取每条视频的video_id与clip_id。例如在本数据集中,一条来自MiraData的视频路径格式如下:
python
video_id = "ubgB1yelZMY"
clip_id = "49850.19"
video_path_in_our_dataset = f"{video_id}/{clip_id}.mp4"
针对来自VideoMarathon的视频,我们使用其在VideoMarathon数据集中的`id`来确定视频路径。您需通过该id在VideoMarathon数据集中查找原始视频的下载路径。例如在本数据集中,一条来自VideoMarathon的视频路径格式如下:
python
id = "videomarathon_03293135"
video_path_in_our_dataset = f"{id}.mp4"
## 数据格式
### 标注文本(Caption)
json
{
"video_path": "Youcook2/merged/4/6nVIgasiUtw.mkv",
"duration": 227,
"caption": "[00:00-00:05] 视频以……开篇",
"caption_summary": "[00:00-00:05] 视频介绍了……",
"source": "VideoEspresso"
}
### 问答(QA)
我们采用`question_id`(问题ID)作为问题的唯一标识,其格式为`{video_path}_{question_type}_multiple_choice`或`{video_path}_{question_type}_open_end`。
#### 多项选择题(Multiple Choice)
json
{
"question_id": "4Z4E0IBaaRg.mp4_numerical_reasoning_multiple_choice",
"multiple_choice": true,
"question_type": "numerical_reasoning",
"question": "玩家第二次与第三次死亡的时间间隔,比第一次与第二次死亡的时间间隔多多少秒?
A:5秒
B:2秒
C:7秒
D:1秒",
"question_without_options": "玩家第二次与第三次死亡的时间间隔,比第一次与第二次死亡的时间间隔多多少秒?",
"mc_options": {
"A": "5秒",
"B": "2秒",
"C": "7秒",
"D": "1秒"
},
"answer": "D",
"video_path": "4Z4E0IBaaRg.mp4",
"source": "Vript"
}
#### 开放式问答(Open End)
json
{
"question_id": "4Z4E0IBaaRg.mp4_numerical_reasoning_open_end",
"multiple_choice": false,
"question_type": "numerical_reasoning",
"question": "玩家第二次与第三次死亡的时间间隔,比第一次与第二次死亡的时间间隔多多少秒?",
"answer": "1秒",
"video_path": "4Z4E0IBaaRg.mp4",
"source": "Vript"
}
### 思维链(Chain of Thought,CoT)
`react_chain`为AI智能体 (AI Agent) 与视频的交互轨迹,`natural_chain`为我们用于训练的思维链数据。
json
{
"question_id": "XD-Violence/2/Kingsman.The.Golden.Circle.2017__#02-12-27_02-14-03_label_A.mp4_object_recognition_open_end",
"react_chain": [
{
"type": "思考",
"number": 1,
"content": "该问题要求找出在盛大教堂仪式的宾客中,除埃格西、蒂尔德公主与哈利之外的其他知名角色,以及他们各自的标志性服饰。要解答该问题,我首先需要在视频中定位这场盛大的教堂仪式,并识别出席的宾客。我将先检索该仪式对应的视频片段。"
},
{
"type": "操作",
"number": 1,
"content": "segment_retrieval(focus_event="the grand church ceremony, specifically the wedding ceremony of Eggsy and Princess Tilde")"
},
{
"type": "观测结果",
"number": 1,
"content": "开始时间:00:53
结束时间:01:36"
},
{
"type": "思考",
"number": 2,
"content": "我已确定盛大教堂仪式的时间片段。接下来,我需要识别出席宾客中的知名角色(排除埃格西、蒂尔德公主与哈利),并记录他们的标志性服饰。由于问题要求找出命名角色,我将聚焦于可识别的面孔,以确定其是否为系列电影中的特定角色。"
},
{
"type": "操作",
"number": 2,
"content": "segment_query(start_time="00:53", end_time="01:36", focus_aspect="guests and their distinct clothing, particularly looking for recognizable characters")"
},
{
"type": "观测结果",
"number": 2,
"content": "在00:53至01:36的片段中,婚礼宾客包括特奎拉特工(钱宁·塔图姆 饰),身着牛仔夹克;以及金杰特工(哈莉·贝瑞 饰),身着黑色连衣裙。出席的会众多元,且包含该电影宇宙中其他身着标志性、可识别服饰的重要角色。"
},
{
"type": "思考",
"number": 3,
"content": "此前的观测结果已识别出两位命名的重要角色:特奎拉特工与金杰特工,并描述了他们各自的标志性服饰,该信息可直接解答问题。"
},
{
"type": "答案",
"content": "出席宾客中的另外两位命名重要角色为:身着牛仔夹克的特奎拉特工,以及身着黑色连衣裙的金杰特工。",
"number": 4
}
],
"natural_chain": "要解答该问题,我首先需要在视频中定位这场盛大的教堂仪式,并识别出席的宾客。我将先检索该仪式对应的视频片段。 <action>我将聚焦于埃格西与蒂尔德公主的婚礼仪式相关片段。</action> <observation>我已确定仪式的时间范围为00:53至01:36。</observation> 接下来,我需要识别出席宾客中的知名角色(排除埃格西、蒂尔德公主与哈利),并记录他们的标志性服饰。由于问题要求找出命名角色,我将聚焦于可识别的面孔,以确定其是否为系列电影中的特定角色。 <action>我将查询00:53至01:36的片段,重点关注宾客及其标志性服饰,尤其是可识别的角色。</action> <observation>我看到了身着牛仔夹克的特奎拉特工,以及身着黑色连衣裙的金杰特工。</observation> 该信息可直接解答问题。 <answer>出席宾客中的另外两位命名重要角色为:身着牛仔夹克的特奎拉特工,以及身着黑色连衣裙的金杰特工。</answer>"
}
提供机构:
maas
创建时间:
2025-09-28
搜集汇总
数据集介绍

背景与挑战
背景概述
ReWatch数据集是一个多源视频数据集,包含详细的视频描述、问答和思维链数据,适用于视频理解和分析任务。数据集需要用户自行下载原始视频,并按照提供的JSON格式进行处理。
以上内容由遇见数据集搜集并总结生成



