Video_Description_Editing
收藏魔搭社区2026-04-28 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/modelscope/Video_Description_Editing
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset
VATEX-EDIT and EMMAD-EDIT are two datasets that support **Video Description Editing** task. Each data sample is a quadruple *(video, command, reference caption, edited caption)* .
You can download the whole dataset annotation files and video features here. All files are structured as the followings:
VATEX-EDIT和EMMAD-EDIT是两个支持 **视频描述编辑** 任务的数据集。每个数据样本是一个四元组 *(video, command, reference caption, edited caption)* 。
你可以在这里下载整个数据集注释文件和视频特征。所有文件的结构如下:
```
dataset/
├── EMMAD-EDIT
│ ├── clip_cn_feats/
│ ├── example_videos/
│ ├── middle_files/
│ ├── abstract_attr/
│ ├── training.json
│ ├── validation.json
│ └── test.json
└── VATEX-EDIT
├── blip_en_feats/
├── clip_en_feats/
├── example_videos/
├── middle_files/
├── training.json
├── validation.json
└── test.json
```
## VATEX-EDIT
It is automatically built upon the English video captioning dataset [VATEX (EN)](https://arxiv.org/pdf/1904.03493.pdf) describing over 600 human activities. The VATEX-EDIT dataset contains 34,269 videos and 1,057,956 *(video, command, reference caption, edited caption)* quadruples. We follow the original VATEX dataset to get training, validation and public test split according to video ids.
它是在英语视频字幕数据集[VATEX (EN)](https://arxiv.org/pdf/1904.03493.pdf)上自动构建的,描述了600多种人类活动。VATEX-EDIT数据集包含34,269个视频和1,057,956个 *(video, command, reference caption, edited caption)* 四元组。遵循原始的VATEX数据集,根据视频id进行training、validation和public test划分。
| #Videos | #Editing instances |
| ---
---
---
---
---
---
---
- | ---
---
---
---
---
---
---
---
-- |
| Train / Val / Test | Train / Val / Test |
| 25,467 / 2,935 / 5,867 | 784,805 / 91,513 / 181,638 |
**Annotation format:**
```
{
"vid": video id,
"dtype": specific type of editing command,
"command": operation (add/delete),
"attr": edited attribute,
"atype": type of attribute (verb/noun/modifier),
"oldcap": reference caption,
"reference": positioned reference using <mask> if assigning positions,
"newcap": edited caption,
}
```
| dtype | global_len_add | global_len_dele | global_attr_add | global_attr_dele | local_len_add | local_len_dele | local_attr_add |
| ---
---
---
-- | ---
---
---
---
---
| ---
---
---
---
---
| ---
---
---
---
---
---
| ---
---
---
---
---
---
| ---
---
---
---
---
-- | ---
---
---
---
---
-- | ---
---
---
---
---
- |
| **command** | *<add, - , - >* | *<del, - , - >* | *<add, - , attr >* | *<del, - , attr >* | *<add, pos , - >* | *<del, pos , - >* | *<add,pos,attr>* |
**Download**
```python
# Use modelscope sdk to load the dataset: (streaming mode)
ds = MsDataset.load('modelscope/Video_Description_Editing', subset_name='vatex', split='train', use_streaming=True)
for item in ds:
print(item)
# name <str>: the file base name
# path:FILE <str>: local path
```
**[Note]** Due to the legal and privacy concerns, we cannot directly share the original videos from VATEX dataset. You can get the related original videos following the instructions of [VATEX dataset website](https://eric-xw.github.io/vatex-website/download.html).
**[注意]** 出于法律和隐私方面的考虑,我们不能直接分享VATEX数据集的原始视频。您可以按照[VATEX数据集网站](https://eric-xw.github.io/vatex-website/download.html)的说明获取相关的原始视频。
**Data Example**
<video src="QAgcSr8Khus_000012_000022.mp4"></video>
```
{
"vid": "QAgcSr8Khus_000012_000022",
"dtype": "local_attr_add",
"command": "<add>",
"attr": "drop",
"atype": "verb",
"oldcap": "a man pets a cat that 's there .",
"reference": "a man <mask> pets a cat that 's there ."
"newcap": ["a man drops some garbage off outdoors and pets a cat that 's there ."],
}
```
## EMMAD-EDIT
It is manually collected based on the Chinese E-commerce video captioning dataset [E-MMAD](https://e-mmad.github.io/e-mmad.net/index.html). Given the product video, the original advertising video description and external product information, we further manually annotate video description editing samples.
The E-MMAD dataset has overall 23,960 editing instances for 12,295 product videos with two remarkable characteristics, i.e. long videos/descriptions and diverse attributes. The average video length is 27.1 seconds and the average description length is around 100 words.
它是基于中国电商视频字幕数据集[E-MMAD](https://e-mmad.github.io/e-mmad.net/index.html)人工采集的。给定产品视频、原始广告视频描述和外部产品信息,进一步手动标注视频描述编辑样本。
E-MMAD数据集共有12,295个产品视频的23,960个编辑实例,具有两个显著的特点,即长视频/描述和多样化属性。视频平均长度为27.1s,平均描述长度在100字左右。
| | #Videos | #Editing instances |
| ---
---
---
---
---
-- | ---
---
---
---
---
---
---
| ---
---
---
---
---
---
---
- |
| EMMAD-EDIT | Train / Val / Test | Train / Val / Test |
| *specific subset* | 16,176 / 5,418 / 5,502 | 31,610 / 10,586 / 10,737 |
| *abstract subset* | 15,955 / 5,328 / 5,432 | 15,959 / 5,328 / 5,432 |
**Annotation format:**
```
{
"vid": video id,
"dtype": specific type of editing command,
"command": operation (add/delete),
"attr": edited attribute,
"atype": type of attribute (specific/abstract),
"oldcap": reference caption,
"reference": positioned reference using <mask> if assigning positions,
"newcap": edited caption,
"allattr": product structure information,
"video_url": url to download the original video,
"video_title": video title,
}
```
| dtype | global_len_add | global_len_dele | global_attr_add | global_attr_dele | local_len_add | local_len_dele | local_attr_add |
| ---
---
---
-- | ---
---
---
---
---
| ---
---
---
---
---
| ---
---
---
---
---
---
| ---
---
---
---
---
---
| ---
---
---
---
---
-- | ---
---
---
---
---
-- | ---
---
---
---
---
- |
| **command** | *<add, - , - >* | *<del, - , - >* | *<add, - , attr >* | *<del, - , attr >* | *<add, pos , - >* | *<del, pos , - >* | *<add,pos,attr>* |
**Download**
```python
# Use modelscope sdk to load the dataset: (streaming mode)
ds = MsDataset.load('modelscope/Video_Description_Editing', subset_name='emmad', split='train', use_streaming=True)
for item in ds:
print(item)
# name <str>: the file base name
# path:FILE <str>: local path
```
We provide the frame clip features and you can download the original video using the video url. The more challenging subset of *abstract attribute* is put under `EMMAD-EDIT/abstract_attr/`.
我们提供frame clip features,您可以使用视频url下载原始视频。更具有挑战性的 *抽象属性* 的子集放在`EMMAD-EDIT/abstract_attr/`下。
**Data Example**
<video src="200563409291.mp4"></video>
```
{
"vid": 200563409291,
"dtype": "local_attr_add",
"command": "<add>",
"attr": "实用,俏皮",
"atype": "specific",
"oldcap": "经典水桶包,子母包设计,再现繁盛时代的小鹿包。外形酷似水桶,包身圆润,别致的子母包设计,追求简约、时尚,凸现自我的创意个性与个人色彩。",
"reference": "经典水桶包,子母包设计,再现繁盛时代的小鹿包。外形酷似水桶,包身圆润<mask>,别致的子母包设计,追求简约<mask>、时尚,凸现自我的创意个性与个人色彩。",
"newcap": "经典水桶包,子母包设计,再现繁盛时代的小鹿包。外形酷似水桶外 型,包身圆润又不失俏皮的造型,别致的子母包设计,追求简约实用的时尚,凸现自我的创意个性与个人色彩。"
"allattr": "品类:斜挎包,单肩包,水桶包,女包;时间季节:2020;新品:新款;风格:时尚,休闲,简约,潮流,欧美时尚;修饰:手提,小鹿;人群:女士;上市时间:2018年春夏;大小:中;箱包硬度:软;款式:单肩包;里料材质:织物;背包方式:单肩斜挎手提;内部结构:手机袋,证件袋,拉链暗袋;品牌:VANESSA HOGAN;颜色分类:香草白1,黑色,香草白,婴儿粉;皮革材质:牛皮;是否可折叠:否;适用场景:休闲;图案:纯色;质地:牛皮;流行元素:车缝线;货号:VH1804158020405;肩带样式:单根;形状:水桶形;销售渠道类型:商场同款(线上线下都销售);流行款式名称:水桶包;成色:全新;提拎部件类型:软把;闭合方式:磁扣;适用对象:青年;有无夹层:无;",
}
```
## Citation
```
@article{Yao2023VDE,
title={Edit As You Wish: Video Description Editing with Multi-grained Commands},
author={Linli Yao and Yuanmeng Zhang and Ziheng Wang and Xinglin Hou and Tiezheng Ge and Yuning Jiang and Qin Jin},
journal={arXiv preprint arXiv:2305.08389},
year={2023}
}
```
# 数据集
VATEX-EDIT与EMMAD-EDIT是两个支持**视频描述编辑(Video Description Editing)**任务的数据集。每个数据样本均为四元组 *(video, command, reference caption, edited caption)*。
您可在此处下载完整的数据集注释文件与视频特征。所有文件的组织结构如下:
dataset/
├── EMMAD-EDIT
│ ├── clip_cn_feats/
│ ├── example_videos/
│ ├── middle_files/
│ ├── abstract_attr/
│ ├── training.json
│ ├── validation.json
│ └── test.json
└── VATEX-EDIT
├── blip_en_feats/
├── clip_en_feats/
├── example_videos/
├── middle_files/
├── training.json
├── validation.json
└── test.json
## VATEX-EDIT
该数据集基于英文视频字幕数据集[VATEX (EN)](https://arxiv.org/pdf/1904.03493.pdf)自动构建,该数据集涵盖了600余种人类活动。VATEX-EDIT数据集包含34,269个视频与1,057,956个四元组 *(video, command, reference caption, edited caption)*。我们遵循原始VATEX数据集的划分规则,根据视频ID对数据进行训练集(training)、验证集(validation)与公开测试集(public test)的拆分。
| 视频数量 | 编辑实例数量 |
| :--------------------- | :------------------ |
| 训练集/验证集/测试集 | 训练集/验证集/测试集 |
| 25,467 / 2,935 / 5,867 | 784,805 / 91,513 / 181,638 |
**注释格式:**
{
"vid": 视频ID,
"dtype": 编辑指令的具体类型,
"command": 操作类型(添加/删除),
"attr": 待编辑的属性,
"atype": 属性类型(动词/名词/修饰词),
"oldcap": 参考字幕,
"reference": 若指定位置则使用<mask>标记的定位参考,
"newcap": 编辑后的字幕,
}
| 编辑指令类型(dtype) | 全局长度增加 | 全局长度删除 | 全局属性添加 | 全局属性删除 | 局部长度增加 | 局部长度删除 | 局部属性添加 |
| :------------------ | :----------- | :----------- | :----------- | :----------- | :----------- | :----------- | :----------- |
| **command** | *<add, - , - >* | *<del, - , - >* | *<add, - , attr >* | *<del, - , attr >* | *<add, pos , - >* | *<del, pos , - >* | *<add,pos,attr>* |
**下载方式:**
python
# 使用ModelScope SDK加载数据集(流式模式):
ds = MsDataset.load('modelscope/Video_Description_Editing', subset_name='vatex', split='train', use_streaming=True)
for item in ds:
print(item)
# name <str>: 文件基础名称
# path:FILE <str>: 本地路径
【注意事项】鉴于法律与隐私方面的考量,我们无法直接分享VATEX数据集的原始视频。您可按照[VATEX数据集官网](https://eric-xw.github.io/vatex-website/download.html)的说明获取相关原始视频。
**数据示例**
<video src="QAgcSr8Khus_000012_000022.mp4"></video>
{
"vid": "QAgcSr8Khus_000012_000022",
"dtype": "local_attr_add",
"command": "<add>",
"attr": "drop",
"atype": "verb",
"oldcap": "a man pets a cat that 's there .",
"reference": "a man <mask> pets a cat that 's there .",
"newcap": ["a man drops some garbage off outdoors and pets a cat that 's there ."],
}
## EMMAD-EDIT
该数据集基于中文电商视频字幕数据集[E-MMAD](https://e-mmad.github.io/e-mmad.net/index.html)人工采集构建。结合产品视频、原始广告视频字幕与外部产品信息,我们进一步人工标注了视频描述编辑样本。
E-MMAD数据集涵盖了12,295个产品视频的23,960个编辑实例,具有两大显著特征:长视频/长字幕与多样化属性。其平均视频时长为27.1秒,平均字幕长度约为100词。
| 数据集子集 | 视频数量 | 编辑实例数量 |
| :--------- | :------- | :----------- |
| EMMAD-EDIT | 训练集/验证集/测试集 | 训练集/验证集/测试集 |
| *特定子集(specific subset)* | 16,176 / 5,418 / 5,502 | 31,610 / 10,586 / 10,737 |
| *抽象子集(abstract subset)* | 15,955 / 5,328 / 5,432 | 15,959 / 5,328 / 5,432 |
**注释格式:**
{
"vid": 视频ID,
"dtype": 编辑指令的具体类型,
"command": 操作类型(添加/删除),
"attr": 待编辑的属性,
"atype": 属性类型(具体/抽象),
"oldcap": 参考字幕,
"reference": 若指定位置则使用<mask>标记的定位参考,
"newcap": 编辑后的字幕,
"allattr": 产品结构信息,
"video_url": 原始视频下载链接,
"video_title": 视频标题,
}
| 编辑指令类型(dtype) | 全局长度增加 | 全局长度删除 | 全局属性添加 | 全局属性删除 | 局部长度增加 | 局部长度删除 | 局部属性添加 |
| :------------------ | :----------- | :----------- | :----------- | :----------- | :----------- | :----------- | :----------- |
| **command** | *<add, - , - >* | *<del, - , - >* | *<add, - , attr >* | *<del, - , attr >* | *<add, pos , - >* | *<del, pos , - >* | *<add,pos,attr>* |
**下载方式:**
python
# 使用ModelScope SDK加载数据集(流式模式):
ds = MsDataset.load('modelscope/Video_Description_Editing', subset_name='emmad', split='train', use_streaming=True)
for item in ds:
print(item)
# name <str>: 文件基础名称
# path:FILE <str>: 本地路径
我们提供帧剪辑特征(frame clip features),您可通过视频下载链接获取原始视频。更具挑战性的*抽象属性子集(abstract attribute subset)*被放置在`EMMAD-EDIT/abstract_attr/`目录下。
**数据示例**
<video src="200563409291.mp4"></video>
{
"vid": 200563409291,
"dtype": "local_attr_add",
"command": "<add>",
"attr": "实用,俏皮",
"atype": "specific",
"oldcap": "经典水桶包,子母包设计,再现繁盛时代的小鹿包。外形酷似水桶,包身圆润,别致的子母包设计,追求简约、时尚,凸现自我的创意个性与个人色彩。",
"reference": "经典水桶包,子母包设计,再现繁盛时代的小鹿包。外形酷似水桶<mask>,别致的子母包设计,追求简约<mask>、时尚,凸现自我的创意个性与个人色彩.",
"newcap": "经典水桶包,子母包设计,再现繁盛时代的小鹿包。外形酷似水桶外型,包身圆润又不失俏皮的造型,别致的子母包设计,追求简约实用的时尚,凸现自我的创意个性与个人色彩.",
"allattr": "品类:斜挎包,单肩包,水桶包,女包;时间季节:2020;新品:新款;风格:时尚,休闲,简约,潮流,欧美时尚;修饰:手提,小鹿;人群:女士;上市时间:2018年春夏;大小:中;箱包硬度:软;款式:单肩包;里料材质:织物;背包方式:单肩斜挎手提;内部结构:手机袋,证件袋,拉链暗袋;品牌:VANESSA HOGAN;颜色分类:香草白1,黑色,香草白,婴儿粉;皮革材质:牛皮;是否可折叠:否;适用场景:休闲;图案:纯色;质地:牛皮;流行元素:车缝线;货号:VH1804158020405;肩带样式:单根;形状:水桶形;销售渠道类型:商场同款(线上线下都销售);流行款式名称:水桶包;成色:全新;提拎部件类型:软把;闭合方式:磁扣;适用对象:青年;有无夹层:无;",
}
## 引用格式
@article{Yao2023VDE,
title={Edit As You Wish: Video Description Editing with Multi-grained Commands},
author={Linli Yao and Yuanmeng Zhang and Ziheng Wang and Xinglin Hou and Tiezheng Ge and Yuning Jiang and Qin Jin},
journal={arXiv preprint arXiv:2305.08389},
year={2023}
}
提供机构:
maas
创建时间:
2023-10-21



