ReWatch

Name: ReWatch
Creator: maas
Published: 2025-12-26 00:06:55
License: 暂无描述

魔搭社区2025-12-26 更新2025-10-04 收录

下载链接：

https://modelscope.cn/datasets/zcccccz/ReWatch

下载链接

链接失效反馈

官方服务：

资源简介：

# ReWatch Dataset Our ReWatch-R1 project page: https://rewatch-r1.github.io/ ## Video Source Our video sources come from other open-source data. You need to download the original video from the following links. VideoEspresso: https://huggingface.co/datasets/hshjerry0315/VideoEspresso_train_video \ Video-R1: https://huggingface.co/datasets/Video-R1/Video-R1-data \ MiraData: https://huggingface.co/datasets/TencentARC/MiraData \ Vript: https://huggingface.co/datasets/Mutonix/Vript \ VideoMarathon: https://huggingface.co/datasets/jylins/videomarathon When using our data, you need to prepend the root directory of the video source to each video path: ```python import os import json video_dirs = { "VideoEspresso": "/path/to/your.../VideoEspresso_train_video", "Video-R1": "/path/to/your.../Video-R1", "MiraData": "/path/to/your.../mira_data", "Vript": "/path/to/your.../vript_long_videos", "VideoMarathon": "/path/to/your.../VideoMarathon" } caption_file = "caption.json" caption_data = json.load(open(caption_file, "r")) for item in caption_data: source_name = item["source"] # video source video_dir = video_dirs[source_name] # The root directory corresponding to the video source ral_video_path = item["video_path"] # Relative path abs_video_path = os.path.join(video_dir, video_dir) # Absolute path ``` For videos from MiraData, we use video_id and clip_id to determine the path of a video. You can find the video_id and clip_id of each video in the official repository of MiraData. For example, in our dataset, a video path from MiraData is: ```python video_id = "ubgB1yelZMY" clip_id = "49850.19" video_path_in_our_dataset = f"{video_id}/{clip_id}.mp4" ``` For videos from VideoMarathon, we use their id in the VideoMarathon dataset to determine the path of a video. You need to use this id to find the download path of the original video in the VideoMarathon dataset. For example, in our dataset, a video path from VideoMarathon is: ```python id = "videomarathon_03293135" video_path_in_our_dataset = f"{id}.mp4" ``` ## Data Format ### Caption ```json { "video_path": "Youcook2/merged/4/6nVIgasiUtw.mkv", "duration": 227, "caption": "[00:00-00:05] The video opens with a ...", "caption_summary": "[00:00-00:05] The video introduces ...", "source": "VideoEspresso" } ``` ### QA We use "question_id" to uniquely determine a question, The format are ```{video_path}_{question_type}_multiple_choice```or```{video_path}_{question_type}_open_end``` #### Multiple Choice ```json { "question_id": "4Z4E0IBaaRg.mp4_numerical_reasoning_multiple_choice", "multiple_choice": true, "question_type": "numerical_reasoning", "question": "By how many seconds does the time interval between the player's second and third deaths exceed the time interval between the player's first and second deaths?\nA: 5 seconds\nB: 2 seconds\nC: 7 seconds\nD: 1 second", "question_without_options": "By how many seconds does the time interval between the player's second and third deaths exceed the time interval between the player's first and second deaths?", "mc_options": { "A": "5 seconds", "B": "2 seconds", "C": "7 seconds", "D": "1 second" }, "answer": "D", "video_path": "4Z4E0IBaaRg.mp4", "source": "Vript" } ``` #### Open End ```json { "question_id": "4Z4E0IBaaRg.mp4_numerical_reasoning_open_end", "multiple_choice": false, "question_type": "numerical_reasoning", "question": "By how many seconds does the time interval between the player's second and third deaths exceed the time interval between the player's first and second deaths?", "answer": "1 second", "video_path": "4Z4E0IBaaRg.mp4", "source": "Vript" } ``` ### CoT react_chain is the interaction trajectory between the agent and the video, and natural_chain is the CoT data we use for training. ```json { "question_id": "XD-Violence/2/Kingsman.The.Golden.Circle.2017__#02-12-27_02-14-03_label_A.mp4_object_recognition_open_end", "react_chain": [ { "type": "thought", "number": 1, "content": "The question asks about named significant characters present among the guests at the grand church ceremony, aside from Eggsy, Princess Tilde, and Harry, and a distinct article of clothing worn by each. To answer this, I first need to locate the grand church ceremony in the video and identify the guests. I will start by retrieving the segment where the grand church ceremony occurs." }, { "type": "action", "number": 1, "content": "segment_retrieval(focus_event=\"the grand church ceremony, specifically the wedding ceremony of Eggsy and Princess Tilde\")" }, { "type": "observation", "number": 1, "content": "start_time: 00:53 \nend_time: 01:36" }, { "type": "thought", "number": 2, "content": "I have identified the time segment for the grand church ceremony. Now, I need to identify the significant characters present among the guests, excluding Eggsy, Princess Tilde, and Harry. I will query the segment focusing on the guests to identify them and their clothing. Since the question asks for named characters, I will focus on recognizable faces that might be identified as specific characters from the franchise." }, { "type": "action", "number": 2, "content": "segment_query(start_time=\"00:53\", end_time=\"01:36\", focus_aspect=\"guests and their distinct clothing, particularly looking for recognizable characters\")" }, { "type": "observation", "number": 2, "content": "Between 00:53 and 01:36, the wedding guests include notable characters such as Agent Tequila (Channing Tatum), dressed in a denim jacket, and Agent Ginger (Halle Berry), wearing a black dress. The congregation is diverse and features other significant figures from the film's universe in distinct, recognizable attire." }, { "type": "thought", "number": 3, "content": "The previous observation already identified two named significant characters, Agent Tequila and Agent Ginger, and described a distinct article of clothing for each. This information directly answers the question." }, { "type": "answer", "content": "The two other named significant characters present among the guests are Agent Tequila, who is dressed in a denim jacket, and Agent Ginger, who is wearing a black dress.", "number": 4 } ], "natural_chain": "To answer this, I first need to locate the grand church ceremony in the video and identify the guests. I will start by retrieving the segment where the grand church ceremony occurs. <action>I'll look for the segment focusing on the wedding ceremony of Eggsy and Princess Tilde.</action> <observation>I've identified the ceremony between 00:53 and 01:36.</observation> Now, I need to identify the significant characters present among the guests, excluding Eggsy, Princess Tilde, and Harry. I will query the segment focusing on the guests to identify them and their clothing. Since the question asks for named characters, I will focus on recognizable faces that might be identified as specific characters from the franchise. <action>I'll query the segment from 00:53 to 01:36, specifically looking for guests and their distinct clothing, particularly recognizable characters.</action> <observation>I see Agent Tequila, dressed in a denim jacket, and Agent Ginger, wearing a black dress.</observation> This information directly answers the question. <answer>The two other named significant characters present among the guests are Agent Tequila, who is dressed in a denim jacket, and Agent Ginger, who is wearing a black dress.</answer>" } ```

# ReWatch 数据集（ReWatch Dataset）我们的ReWatch-R1项目页面：https://rewatch-r1.github.io/ ## 视频源本数据集的视频素材均来自其他开源数据集，您需通过以下链接下载原始视频： VideoEspresso：https://huggingface.co/datasets/hshjerry0315/VideoEspresso_train_video Video-R1：https://huggingface.co/datasets/Video-R1/Video-R1-data MiraData：https://huggingface.co/datasets/TencentARC/MiraData Vript：https://huggingface.co/datasets/Mutonix/Vript VideoMarathon：https://huggingface.co/datasets/jylins/videomarathon 使用本数据集时，您需将对应视频源的根目录拼接至每条视频路径前方： python import os import json # 定义各视频源的根目录 video_dirs = { "VideoEspresso": "/path/to/your.../VideoEspresso_train_video", "Video-R1": "/path/to/your.../Video-R1", "MiraData": "/path/to/your.../mira_data", "Vript": "/path/to/your.../vript_long_videos", "VideoMarathon": "/path/to/your.../VideoMarathon" } caption_file = "caption.json" caption_data = json.load(open(caption_file, "r")) for item in caption_data: source_name = item["source"] # 视频源名称 video_dir = video_dirs[source_name] # 该视频源对应的根目录 ral_video_path = item["video_path"] # 相对路径 abs_video_path = os.path.join(video_dir, video_dir) # 拼接得到绝对路径针对来自MiraData的视频，我们通过`video_id`（视频ID）与`clip_id`（片段ID）来确定视频路径。您可在MiraData的官方仓库中获取每条视频的video_id与clip_id。例如在本数据集中，一条来自MiraData的视频路径格式如下： python video_id = "ubgB1yelZMY" clip_id = "49850.19" video_path_in_our_dataset = f"{video_id}/{clip_id}.mp4" 针对来自VideoMarathon的视频，我们使用其在VideoMarathon数据集中的`id`来确定视频路径。您需通过该id在VideoMarathon数据集中查找原始视频的下载路径。例如在本数据集中，一条来自VideoMarathon的视频路径格式如下： python id = "videomarathon_03293135" video_path_in_our_dataset = f"{id}.mp4" ## 数据格式 ### 标注文本（Caption） json { "video_path": "Youcook2/merged/4/6nVIgasiUtw.mkv", "duration": 227, "caption": "[00:00-00:05] 视频以……开篇", "caption_summary": "[00:00-00:05] 视频介绍了……", "source": "VideoEspresso" } ### 问答（QA）我们采用`question_id`（问题ID）作为问题的唯一标识，其格式为`{video_path}_{question_type}_multiple_choice`或`{video_path}_{question_type}_open_end`。 #### 多项选择题（Multiple Choice） json { "question_id": "4Z4E0IBaaRg.mp4_numerical_reasoning_multiple_choice", "multiple_choice": true, "question_type": "numerical_reasoning", "question": "玩家第二次与第三次死亡的时间间隔，比第一次与第二次死亡的时间间隔多多少秒？ A：5秒 B：2秒 C：7秒 D：1秒", "question_without_options": "玩家第二次与第三次死亡的时间间隔，比第一次与第二次死亡的时间间隔多多少秒？", "mc_options": { "A": "5秒", "B": "2秒", "C": "7秒", "D": "1秒" }, "answer": "D", "video_path": "4Z4E0IBaaRg.mp4", "source": "Vript" } #### 开放式问答（Open End） json { "question_id": "4Z4E0IBaaRg.mp4_numerical_reasoning_open_end", "multiple_choice": false, "question_type": "numerical_reasoning", "question": "玩家第二次与第三次死亡的时间间隔，比第一次与第二次死亡的时间间隔多多少秒？", "answer": "1秒", "video_path": "4Z4E0IBaaRg.mp4", "source": "Vript" } ### 思维链（Chain of Thought，CoT） `react_chain`为AI智能体 (AI Agent) 与视频的交互轨迹，`natural_chain`为我们用于训练的思维链数据。 json { "question_id": "XD-Violence/2/Kingsman.The.Golden.Circle.2017__#02-12-27_02-14-03_label_A.mp4_object_recognition_open_end", "react_chain": [ { "type": "思考", "number": 1, "content": "该问题要求找出在盛大教堂仪式的宾客中，除埃格西、蒂尔德公主与哈利之外的其他知名角色，以及他们各自的标志性服饰。要解答该问题，我首先需要在视频中定位这场盛大的教堂仪式，并识别出席的宾客。我将先检索该仪式对应的视频片段。" }, { "type": "操作", "number": 1, "content": "segment_retrieval(focus_event="the grand church ceremony, specifically the wedding ceremony of Eggsy and Princess Tilde")" }, { "type": "观测结果", "number": 1, "content": "开始时间：00:53 结束时间：01:36" }, { "type": "思考", "number": 2, "content": "我已确定盛大教堂仪式的时间片段。接下来，我需要识别出席宾客中的知名角色（排除埃格西、蒂尔德公主与哈利），并记录他们的标志性服饰。由于问题要求找出命名角色，我将聚焦于可识别的面孔，以确定其是否为系列电影中的特定角色。" }, { "type": "操作", "number": 2, "content": "segment_query(start_time="00:53", end_time="01:36", focus_aspect="guests and their distinct clothing, particularly looking for recognizable characters")" }, { "type": "观测结果", "number": 2, "content": "在00:53至01:36的片段中，婚礼宾客包括特奎拉特工（钱宁·塔图姆饰），身着牛仔夹克；以及金杰特工（哈莉·贝瑞饰），身着黑色连衣裙。出席的会众多元，且包含该电影宇宙中其他身着标志性、可识别服饰的重要角色。" }, { "type": "思考", "number": 3, "content": "此前的观测结果已识别出两位命名的重要角色：特奎拉特工与金杰特工，并描述了他们各自的标志性服饰，该信息可直接解答问题。" }, { "type": "答案", "content": "出席宾客中的另外两位命名重要角色为：身着牛仔夹克的特奎拉特工，以及身着黑色连衣裙的金杰特工。", "number": 4 } ], "natural_chain": "要解答该问题，我首先需要在视频中定位这场盛大的教堂仪式，并识别出席的宾客。我将先检索该仪式对应的视频片段。 <action>我将聚焦于埃格西与蒂尔德公主的婚礼仪式相关片段。</action> <observation>我已确定仪式的时间范围为00:53至01:36。</observation> 接下来，我需要识别出席宾客中的知名角色（排除埃格西、蒂尔德公主与哈利），并记录他们的标志性服饰。由于问题要求找出命名角色，我将聚焦于可识别的面孔，以确定其是否为系列电影中的特定角色。 <action>我将查询00:53至01:36的片段，重点关注宾客及其标志性服饰，尤其是可识别的角色。</action> <observation>我看到了身着牛仔夹克的特奎拉特工，以及身着黑色连衣裙的金杰特工。</observation> 该信息可直接解答问题。 <answer>出席宾客中的另外两位命名重要角色为：身着牛仔夹克的特奎拉特工，以及身着黑色连衣裙的金杰特工。</answer>" }

提供机构：

maas

创建时间：

2025-09-28

搜集汇总

数据集介绍