five

ductai199x/video-std-manip

收藏
Hugging Face2024-04-01 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/ductai199x/video-std-manip
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: - cc-by-nc-sa-4.0 pretty_name: VSM category: - vcms (Video Camera Model Splicing) - vpvm (Video Perceptually Visible Manipulation) - vpim (Video Perceptually Invisible Manipulation) category_size: videos: 4000 frames: 120000 --- # Video Standard Manipulation Dataset ## Dataset Description - **Paper:** [VideoFACT: Detecting Video Forgeries Using Attention, Scene Context, and Forensic Traces](https://openaccess.thecvf.com/content/WACV2024/papers/Nguyen_VideoFACT_Detecting_Video_Forgeries_Using_Attention_Scene_Context_and_Forensic_WACV_2024_paper.pdf) - **Total amount of data used:** approx. 15GB This dataset is a collection of simple and traditional localized video manipulations, such as: splicing, color correction, contrast enhancement, bluring, and noise addition. The dataset is designed to be used for training and evaluating video manipulation detection models. We used this dataset to train the VideoFACT model, which is a deep learning model that uses attention, scene context, and forensic traces to detect a wide variety of video forgery types, i.e. splicing, editing, deepfake, inpainting. The dataset is divided into three parts: Video Camera Model Splicing (VCMS), Video Perceptually Visible Manipulation (VPVM), and Video Perceptually Invisible Manipulation (VPIM). Each part has a total of 4000 videos, each video is 1 second, or 30 frames, has a resolution of 1920 x 1080, and encoded using FFmpeg with the H.264 codec at CRF 23. Additionally, each part is splited into training, validation, and testing sets that consists of 3200, 200, 600 videos, respectively. More details about the dataset can be found in the paper. ## Necessary Dependencies ```bash pip install torch decord fsspec ``` ## Usage Example The Video Standard Manipulation (VSM) Dataset can be downloaded and used as follows: ```py import torch from torch.utils.data import Dataset, IterableDataset, DataLoader import datasets import decord import fsspec decord.bridge.set_bridge("torch") vsm_ds = datasets.load_dataset("ductai199x/video_std_manip", "vcms", trust_remote_code=True) # or "vpvm" or "vpim" # see structure print(vsm_ds) # custom dataset wrapper to load video faster class VsmDsWrapper(Dataset): def __init__(self, ds: datasets.Dataset): self.ds = ds def __len__(self): return len(self.ds) def __getitem__(self, idx): example = self.ds[idx] vid_path = example["vid_path"] mask_path = example["mask_path"] label = example["label"] vid = decord.VideoReader(vid_path)[:].float() / 255.0 if label == 1: mask = decord.VideoReader(mask_path)[:].float() / 255.0 else: mask = torch.zeros_like(vid) mask = (mask.mean(3) > 0.5).float() # T, H, W vid = vid.permute(0, 3, 1, 2) # T, H, W, C -> T, C, H, W return { "vid": vid, "mask": mask, "label": label, } # custom iterable dataset wrapper in case you want to stream the dataset class VsmIterDsWrapper(IterableDataset): def __init__(self, ds: datasets.IterableDataset): self.ds = ds def __iter__(self): for example in self.ds: vid_path = example["vid_path"] mask_path = example["mask_path"] label = example["label"] vid = decord.VideoReader(fsspec.open(vid_path, "rb").open())[:].float() / 255.0 if label == 1: mask = decord.VideoReader(fsspec.open(mask_path, "rb").open())[:].float() / 255.0 else: mask = torch.zeros_like(vid) mask = (mask.mean(3) > 0.5).float() # T, H, W vid = vid.permute(0, 3, 1, 2) # T, H, W, C -> T, C, H, W yield { "vid": vid, "mask": mask, "label": label, } # Highly recommend you using Dataloader to load the dataset faster vsm_dl = DataLoader(VsmDsWrapper(vsm_ds["train"]), batch_size=2, num_workers=14, persistent_workers=True) for batch in vsm_dl: vid = batch["vid"] mask = batch["mask"] label = batch["label"] print(vid.shape, mask.shape, label) ``` ## Dataset Structure ### Data Instances Some frame examples from this dataset: #### VCMS ![vcms](vcms_example.jpg) ![vcms_mask](vcms_example_mask.jpg) #### VPVM ![vpvm](vpvm_example.jpg) ![vpvm_mask](vpvm_example_mask.jpg) #### VPIM ![vpim](vpim_example.jpg) ![vpim_mask](vpim_example_mask.jpg) ### Data Fields The data fields are the same among all splits. - **vid_path** (str): Path to the video file - **mask_path** (str): Path to the mask file. This will equal to empty string if the video is not manipulated. - **label** (int): 1 if the video is manipulated, 0 otherwise. ### Data Splits Each part (vcms, vpvm, vpim) has a total of 4000 videos, each video is 1 second, or 30 frames, has a resolution of 1920 x 1080, and encoded using FFmpeg with the H.264 codec at CRF 23. Additionally, each part is splited into training, validation, and testing sets that consists of 3200, 200, 600 videos, respectively. ## Dataset Creation Each part in this dataset was made by applying different sets of standard manipulations to videos from the Video-ACID dataset. All three parts were made using a common procedure. First, we created binary ground-truth masks specifying the tamper regions for each video. These tamper regions correspond to multiple randomly chosen shapes with random sizes, orientations, and placements within a frame. Fake videos were created by choosing a mask, then manipulating content within the tamper region. Original videos were retained to form the set of authentic videos. All real and manipulated video frames were re-encoded as H.264 videos using FFmpeg with 30 FPS and constant rate factor of 23. Each part in this dataset corresponds to a different manipulation type. The Video Camera Model Splicing (VCMS) part contains videos with content spliced in from other videos. The Video Perceptually Visible Manipulation (VPVM) part contains content modified using common editing operations, e.g. contrast enhancement, smoothing, sharpening, blurring, etc. applied with strengths that can be visually detected. The Video Perceptually Invisible Manipulation (VPIM) part was made in a similar fashion to VPVM, but with much smaller manipulation strengths to create challenging forgeries. For each dataset, we made 3200 videos (96000 frames) for training, 200 videos (15600 frames) for validation, 600 videos (8400 frames) for testing. More details can be found in the paper. ## Additional Information ### Licensing Information All datasets are licensed under the [Creative Commons Attribution, Non-Commercial, Share-alike license (CC BY-NC-SA)](https://creativecommons.org/licenses/by-nc-sa/4.0/). ### Citation Information ``` @InProceedings{Nguyen_2024_WACV, author = {Nguyen, Tai D. and Fang, Shengbang and Stamm, Matthew C.}, title = {VideoFACT: Detecting Video Forgeries Using Attention, Scene Context, and Forensic Traces}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {8563-8573} } ``` ### Contribution We thank the authors of the Video-ACID dataset (https://ieee-dataport.org/documents/video-acid) for their work. ### Contact For any questions, please contact Tai Nguyen at [@ductai199x](https://github.com/ductai199x) or by [email](mailto:taiducnguyen.drexel@gmail.com).

Video Standard Manipulation Dataset(VSM数据集)是一个包含简单和传统局部视频操作的数据集,例如拼接、色彩校正、对比度增强、模糊和噪声添加。该数据集用于训练和评估视频操作检测模型,特别是VideoFACT模型。数据集分为三部分:Video Camera Model Splicing (VCMS)、Video Perceptually Visible Manipulation (VPVM)和Video Perceptually Invisible Manipulation (VPIM)。每部分包含4000个视频,每个视频时长为1秒,分辨率为1920x1080,使用H.264编码。数据集分为训练集、验证集和测试集,分别包含3200、200和600个视频。
提供机构:
ductai199x
原始信息汇总

数据集概述

数据集名称

  • 名称: Video Standard Manipulation Dataset
  • 简称: VSM

数据集类别

  • VCMS (Video Camera Model Splicing)
  • VPVM (Video Perceptually Visible Manipulation)
  • VPIM (Video Perceptually Invisible Manipulation)

数据集规模

  • 视频数量: 4000
  • 帧数: 120000

数据集内容

  • 视频长度: 每视频1秒(30帧)
  • 分辨率: 1920 x 1080
  • 编码: 使用FFmpeg,H.264 codec,CRF 23

数据集结构

  • 数据字段:
    • vid_path (str): 视频文件路径
    • mask_path (str): 掩码文件路径
    • label (int): 视频是否被操纵(1表示操纵,0表示未操纵)

数据集分割

  • 训练集: 3200视频
  • 验证集: 200视频
  • 测试集: 600视频

数据集用途

  • 目的: 用于训练和评估视频操纵检测模型
  • 应用: 用于训练VideoFACT模型,该模型使用注意力、场景上下文和取证痕迹来检测多种视频伪造类型

许可证

  • 许可证: Creative Commons Attribution, Non-Commercial, Share-alike license (CC BY-NC-SA)
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作