ductai199x/video-std-manip

Name: ductai199x/video-std-manip
Creator: ductai199x
Published: 2024-04-01 16:49:24
License: 暂无描述

Hugging Face2024-04-01 更新2024-06-11 收录

下载链接：

https://hf-mirror.com/datasets/ductai199x/video-std-manip

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: - cc-by-nc-sa-4.0 pretty_name: VSM category: - vcms (Video Camera Model Splicing) - vpvm (Video Perceptually Visible Manipulation) - vpim (Video Perceptually Invisible Manipulation) category_size: videos: 4000 frames: 120000 --- # Video Standard Manipulation Dataset ## Dataset Description - **Paper:** [VideoFACT: Detecting Video Forgeries Using Attention, Scene Context, and Forensic Traces](https://openaccess.thecvf.com/content/WACV2024/papers/Nguyen_VideoFACT_Detecting_Video_Forgeries_Using_Attention_Scene_Context_and_Forensic_WACV_2024_paper.pdf) - **Total amount of data used:** approx. 15GB This dataset is a collection of simple and traditional localized video manipulations, such as: splicing, color correction, contrast enhancement, bluring, and noise addition. The dataset is designed to be used for training and evaluating video manipulation detection models. We used this dataset to train the VideoFACT model, which is a deep learning model that uses attention, scene context, and forensic traces to detect a wide variety of video forgery types, i.e. splicing, editing, deepfake, inpainting. The dataset is divided into three parts: Video Camera Model Splicing (VCMS), Video Perceptually Visible Manipulation (VPVM), and Video Perceptually Invisible Manipulation (VPIM). Each part has a total of 4000 videos, each video is 1 second, or 30 frames, has a resolution of 1920 x 1080, and encoded using FFmpeg with the H.264 codec at CRF 23. Additionally, each part is splited into training, validation, and testing sets that consists of 3200, 200, 600 videos, respectively. More details about the dataset can be found in the paper. ## Necessary Dependencies ```bash pip install torch decord fsspec ``` ## Usage Example The Video Standard Manipulation (VSM) Dataset can be downloaded and used as follows: ```py import torch from torch.utils.data import Dataset, IterableDataset, DataLoader import datasets import decord import fsspec decord.bridge.set_bridge("torch") vsm_ds = datasets.load_dataset("ductai199x/video_std_manip", "vcms", trust_remote_code=True) # or "vpvm" or "vpim" # see structure print(vsm_ds) # custom dataset wrapper to load video faster class VsmDsWrapper(Dataset): def __init__(self, ds: datasets.Dataset): self.ds = ds def __len__(self): return len(self.ds) def __getitem__(self, idx): example = self.ds[idx] vid_path = example["vid_path"] mask_path = example["mask_path"] label = example["label"] vid = decord.VideoReader(vid_path)[:].float() / 255.0 if label == 1: mask = decord.VideoReader(mask_path)[:].float() / 255.0 else: mask = torch.zeros_like(vid) mask = (mask.mean(3) > 0.5).float() # T, H, W vid = vid.permute(0, 3, 1, 2) # T, H, W, C -> T, C, H, W return { "vid": vid, "mask": mask, "label": label, } # custom iterable dataset wrapper in case you want to stream the dataset class VsmIterDsWrapper(IterableDataset): def __init__(self, ds: datasets.IterableDataset): self.ds = ds def __iter__(self): for example in self.ds: vid_path = example["vid_path"] mask_path = example["mask_path"] label = example["label"] vid = decord.VideoReader(fsspec.open(vid_path, "rb").open())[:].float() / 255.0 if label == 1: mask = decord.VideoReader(fsspec.open(mask_path, "rb").open())[:].float() / 255.0 else: mask = torch.zeros_like(vid) mask = (mask.mean(3) > 0.5).float() # T, H, W vid = vid.permute(0, 3, 1, 2) # T, H, W, C -> T, C, H, W yield { "vid": vid, "mask": mask, "label": label, } # Highly recommend you using Dataloader to load the dataset faster vsm_dl = DataLoader(VsmDsWrapper(vsm_ds["train"]), batch_size=2, num_workers=14, persistent_workers=True) for batch in vsm_dl: vid = batch["vid"] mask = batch["mask"] label = batch["label"] print(vid.shape, mask.shape, label) ``` ## Dataset Structure ### Data Instances Some frame examples from this dataset: #### VCMS ![vcms](vcms_example.jpg) ![vcms_mask](vcms_example_mask.jpg) #### VPVM ![vpvm](vpvm_example.jpg) ![vpvm_mask](vpvm_example_mask.jpg) #### VPIM ![vpim](vpim_example.jpg) ![vpim_mask](vpim_example_mask.jpg) ### Data Fields The data fields are the same among all splits. - **vid_path** (str): Path to the video file - **mask_path** (str): Path to the mask file. This will equal to empty string if the video is not manipulated. - **label** (int): 1 if the video is manipulated, 0 otherwise. ### Data Splits Each part (vcms, vpvm, vpim) has a total of 4000 videos, each video is 1 second, or 30 frames, has a resolution of 1920 x 1080, and encoded using FFmpeg with the H.264 codec at CRF 23. Additionally, each part is splited into training, validation, and testing sets that consists of 3200, 200, 600 videos, respectively. ## Dataset Creation Each part in this dataset was made by applying different sets of standard manipulations to videos from the Video-ACID dataset. All three parts were made using a common procedure. First, we created binary ground-truth masks specifying the tamper regions for each video. These tamper regions correspond to multiple randomly chosen shapes with random sizes, orientations, and placements within a frame. Fake videos were created by choosing a mask, then manipulating content within the tamper region. Original videos were retained to form the set of authentic videos. All real and manipulated video frames were re-encoded as H.264 videos using FFmpeg with 30 FPS and constant rate factor of 23. Each part in this dataset corresponds to a different manipulation type. The Video Camera Model Splicing (VCMS) part contains videos with content spliced in from other videos. The Video Perceptually Visible Manipulation (VPVM) part contains content modified using common editing operations, e.g. contrast enhancement, smoothing, sharpening, blurring, etc. applied with strengths that can be visually detected. The Video Perceptually Invisible Manipulation (VPIM) part was made in a similar fashion to VPVM, but with much smaller manipulation strengths to create challenging forgeries. For each dataset, we made 3200 videos (96000 frames) for training, 200 videos (15600 frames) for validation, 600 videos (8400 frames) for testing. More details can be found in the paper. ## Additional Information ### Licensing Information All datasets are licensed under the [Creative Commons Attribution, Non-Commercial, Share-alike license (CC BY-NC-SA)](https://creativecommons.org/licenses/by-nc-sa/4.0/). ### Citation Information ``` @InProceedings{Nguyen_2024_WACV, author = {Nguyen, Tai D. and Fang, Shengbang and Stamm, Matthew C.}, title = {VideoFACT: Detecting Video Forgeries Using Attention, Scene Context, and Forensic Traces}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {8563-8573} } ``` ### Contribution We thank the authors of the Video-ACID dataset (https://ieee-dataport.org/documents/video-acid) for their work. ### Contact For any questions, please contact Tai Nguyen at [@ductai199x](https://github.com/ductai199x) or by [email](mailto:taiducnguyen.drexel@gmail.com).

Video Standard Manipulation Dataset（VSM数据集）是一个包含简单和传统局部视频操作的数据集，例如拼接、色彩校正、对比度增强、模糊和噪声添加。该数据集用于训练和评估视频操作检测模型，特别是VideoFACT模型。数据集分为三部分：Video Camera Model Splicing (VCMS)、Video Perceptually Visible Manipulation (VPVM)和Video Perceptually Invisible Manipulation (VPIM)。每部分包含4000个视频，每个视频时长为1秒，分辨率为1920x1080，使用H.264编码。数据集分为训练集、验证集和测试集，分别包含3200、200和600个视频。

提供机构：

ductai199x

原始信息汇总

数据集概述

数据集名称

名称： Video Standard Manipulation Dataset
简称： VSM

数据集类别

VCMS (Video Camera Model Splicing)
VPVM (Video Perceptually Visible Manipulation)
VPIM (Video Perceptually Invisible Manipulation)

数据集规模

视频数量： 4000
帧数： 120000

数据集内容

视频长度： 每视频1秒（30帧）
分辨率： 1920 x 1080
编码： 使用FFmpeg，H.264 codec，CRF 23

数据集结构

数据字段：
- vid_path (str): 视频文件路径
- mask_path (str): 掩码文件路径
- label (int): 视频是否被操纵（1表示操纵，0表示未操纵）

数据集分割

训练集： 3200视频
验证集： 200视频
测试集： 600视频

数据集用途

目的： 用于训练和评估视频操纵检测模型
应用： 用于训练VideoFACT模型，该模型使用注意力、场景上下文和取证痕迹来检测多种视频伪造类型

许可证

许可证： Creative Commons Attribution, Non-Commercial, Share-alike license (CC BY-NC-SA)

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集