ductai199x/video-std-manip
收藏Hugging Face2024-04-01 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/ductai199x/video-std-manip
下载链接
链接失效反馈官方服务:
资源简介:
---
license:
- cc-by-nc-sa-4.0
pretty_name: VSM
category:
- vcms (Video Camera Model Splicing)
- vpvm (Video Perceptually Visible Manipulation)
- vpim (Video Perceptually Invisible Manipulation)
category_size:
videos: 4000
frames: 120000
---
# Video Standard Manipulation Dataset
## Dataset Description
- **Paper:** [VideoFACT: Detecting Video Forgeries Using Attention, Scene Context, and Forensic Traces](https://openaccess.thecvf.com/content/WACV2024/papers/Nguyen_VideoFACT_Detecting_Video_Forgeries_Using_Attention_Scene_Context_and_Forensic_WACV_2024_paper.pdf)
- **Total amount of data used:** approx. 15GB
This dataset is a collection of simple and traditional localized video manipulations, such as: splicing, color correction, contrast enhancement, bluring, and noise addition. The dataset is designed to be used for training and evaluating video manipulation detection models. We used this dataset to train the VideoFACT model, which is a deep learning model that uses attention, scene context, and forensic traces to detect a wide variety of video forgery types, i.e. splicing, editing, deepfake, inpainting. The dataset is divided into three parts: Video Camera Model Splicing (VCMS), Video Perceptually Visible Manipulation (VPVM), and Video Perceptually Invisible Manipulation (VPIM). Each part has a total of 4000 videos, each video is 1 second, or 30 frames, has a resolution of 1920 x 1080, and encoded using FFmpeg with the H.264 codec at CRF 23. Additionally, each part is splited into training, validation, and testing sets that consists of 3200, 200, 600 videos, respectively. More details about the dataset can be found in the paper.
## Necessary Dependencies
```bash
pip install torch decord fsspec
```
## Usage Example
The Video Standard Manipulation (VSM) Dataset can be downloaded and used as follows:
```py
import torch
from torch.utils.data import Dataset, IterableDataset, DataLoader
import datasets
import decord
import fsspec
decord.bridge.set_bridge("torch")
vsm_ds = datasets.load_dataset("ductai199x/video_std_manip", "vcms", trust_remote_code=True) # or "vpvm" or "vpim"
# see structure
print(vsm_ds)
# custom dataset wrapper to load video faster
class VsmDsWrapper(Dataset):
def __init__(self, ds: datasets.Dataset):
self.ds = ds
def __len__(self):
return len(self.ds)
def __getitem__(self, idx):
example = self.ds[idx]
vid_path = example["vid_path"]
mask_path = example["mask_path"]
label = example["label"]
vid = decord.VideoReader(vid_path)[:].float() / 255.0
if label == 1:
mask = decord.VideoReader(mask_path)[:].float() / 255.0
else:
mask = torch.zeros_like(vid)
mask = (mask.mean(3) > 0.5).float() # T, H, W
vid = vid.permute(0, 3, 1, 2) # T, H, W, C -> T, C, H, W
return {
"vid": vid,
"mask": mask,
"label": label,
}
# custom iterable dataset wrapper in case you want to stream the dataset
class VsmIterDsWrapper(IterableDataset):
def __init__(self, ds: datasets.IterableDataset):
self.ds = ds
def __iter__(self):
for example in self.ds:
vid_path = example["vid_path"]
mask_path = example["mask_path"]
label = example["label"]
vid = decord.VideoReader(fsspec.open(vid_path, "rb").open())[:].float() / 255.0
if label == 1:
mask = decord.VideoReader(fsspec.open(mask_path, "rb").open())[:].float() / 255.0
else:
mask = torch.zeros_like(vid)
mask = (mask.mean(3) > 0.5).float() # T, H, W
vid = vid.permute(0, 3, 1, 2) # T, H, W, C -> T, C, H, W
yield {
"vid": vid,
"mask": mask,
"label": label,
}
# Highly recommend you using Dataloader to load the dataset faster
vsm_dl = DataLoader(VsmDsWrapper(vsm_ds["train"]), batch_size=2, num_workers=14, persistent_workers=True)
for batch in vsm_dl:
vid = batch["vid"]
mask = batch["mask"]
label = batch["label"]
print(vid.shape, mask.shape, label)
```
## Dataset Structure
### Data Instances
Some frame examples from this dataset:
#### VCMS


#### VPVM


#### VPIM


### Data Fields
The data fields are the same among all splits.
- **vid_path** (str): Path to the video file
- **mask_path** (str): Path to the mask file. This will equal to empty string if the video is not manipulated.
- **label** (int): 1 if the video is manipulated, 0 otherwise.
### Data Splits
Each part (vcms, vpvm, vpim) has a total of 4000 videos, each video is 1 second, or 30 frames, has a resolution of 1920 x 1080, and encoded using FFmpeg with the H.264 codec at CRF 23. Additionally, each part is splited into training, validation, and testing sets that consists of 3200, 200, 600 videos, respectively.
## Dataset Creation
Each part in this dataset was made by applying different sets of standard manipulations to videos from the Video-ACID dataset. All three parts were made using a common procedure. First, we created binary ground-truth masks specifying the tamper regions for each video. These tamper regions correspond to multiple randomly chosen shapes with random sizes, orientations, and placements within a frame. Fake videos were created by choosing a mask, then manipulating content within the tamper region.
Original videos were retained to form the set of authentic videos.
All real and manipulated video frames were re-encoded as H.264 videos using FFmpeg with 30 FPS and constant rate factor of 23.
Each part in this dataset corresponds to a different manipulation type. The Video Camera Model Splicing (VCMS) part contains videos with content spliced in from other videos. The Video Perceptually Visible Manipulation (VPVM) part contains content modified using common editing operations, e.g. contrast enhancement, smoothing, sharpening, blurring, etc. applied with strengths that can be visually detected. The Video Perceptually Invisible Manipulation (VPIM) part was made in a similar fashion to VPVM, but with much smaller manipulation strengths to create challenging forgeries. For each dataset, we made 3200 videos (96000 frames) for training, 200 videos (15600 frames) for validation, 600 videos (8400 frames) for testing. More details can be found in the paper.
## Additional Information
### Licensing Information
All datasets are licensed under the [Creative Commons Attribution, Non-Commercial, Share-alike license (CC BY-NC-SA)](https://creativecommons.org/licenses/by-nc-sa/4.0/).
### Citation Information
```
@InProceedings{Nguyen_2024_WACV,
author = {Nguyen, Tai D. and Fang, Shengbang and Stamm, Matthew C.},
title = {VideoFACT: Detecting Video Forgeries Using Attention, Scene Context, and Forensic Traces},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {January},
year = {2024},
pages = {8563-8573}
}
```
### Contribution
We thank the authors of the Video-ACID dataset (https://ieee-dataport.org/documents/video-acid) for their work.
### Contact
For any questions, please contact Tai Nguyen at [@ductai199x](https://github.com/ductai199x) or by [email](mailto:taiducnguyen.drexel@gmail.com).
Video Standard Manipulation Dataset(VSM数据集)是一个包含简单和传统局部视频操作的数据集,例如拼接、色彩校正、对比度增强、模糊和噪声添加。该数据集用于训练和评估视频操作检测模型,特别是VideoFACT模型。数据集分为三部分:Video Camera Model Splicing (VCMS)、Video Perceptually Visible Manipulation (VPVM)和Video Perceptually Invisible Manipulation (VPIM)。每部分包含4000个视频,每个视频时长为1秒,分辨率为1920x1080,使用H.264编码。数据集分为训练集、验证集和测试集,分别包含3200、200和600个视频。
提供机构:
ductai199x
原始信息汇总
数据集概述
数据集名称
- 名称: Video Standard Manipulation Dataset
- 简称: VSM
数据集类别
- VCMS (Video Camera Model Splicing)
- VPVM (Video Perceptually Visible Manipulation)
- VPIM (Video Perceptually Invisible Manipulation)
数据集规模
- 视频数量: 4000
- 帧数: 120000
数据集内容
- 视频长度: 每视频1秒(30帧)
- 分辨率: 1920 x 1080
- 编码: 使用FFmpeg,H.264 codec,CRF 23
数据集结构
- 数据字段:
- vid_path (str): 视频文件路径
- mask_path (str): 掩码文件路径
- label (int): 视频是否被操纵(1表示操纵,0表示未操纵)
数据集分割
- 训练集: 3200视频
- 验证集: 200视频
- 测试集: 600视频
数据集用途
- 目的: 用于训练和评估视频操纵检测模型
- 应用: 用于训练VideoFACT模型,该模型使用注意力、场景上下文和取证痕迹来检测多种视频伪造类型
许可证
- 许可证: Creative Commons Attribution, Non-Commercial, Share-alike license (CC BY-NC-SA)
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



