Video_Description_Editing

Name: Video_Description_Editing
Creator: maas
Published: 2026-04-28 16:13:23
License: 暂无描述

魔搭社区2026-04-28 更新2024-05-15 收录

下载链接：

https://modelscope.cn/datasets/modelscope/Video_Description_Editing

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset VATEX-EDIT and EMMAD-EDIT are two datasets that support **Video Description Editing** task. Each data sample is a quadruple *(video, command, reference caption, edited caption)* . You can download the whole dataset annotation files and video features here. All files are structured as the followings: VATEX-EDIT和EMMAD-EDIT是两个支持 **视频描述编辑** 任务的数据集。每个数据样本是一个四元组 *(video, command, reference caption, edited caption)* 。你可以在这里下载整个数据集注释文件和视频特征。所有文件的结构如下: ``` dataset/ ├── EMMAD-EDIT │ ├── clip_cn_feats/ │ ├── example_videos/ │ ├── middle_files/ │ ├── abstract_attr/ │ ├── training.json │ ├── validation.json │ └── test.json └── VATEX-EDIT ├── blip_en_feats/ ├── clip_en_feats/ ├── example_videos/ ├── middle_files/ ├── training.json ├── validation.json └── test.json ``` ## VATEX-EDIT It is automatically built upon the English video captioning dataset [VATEX (EN)](https://arxiv.org/pdf/1904.03493.pdf) describing over 600 human activities. The VATEX-EDIT dataset contains 34,269 videos and 1,057,956 *(video, command, reference caption, edited caption)* quadruples. We follow the original VATEX dataset to get training, validation and public test split according to video ids. 它是在英语视频字幕数据集[VATEX (EN)](https://arxiv.org/pdf/1904.03493.pdf)上自动构建的，描述了600多种人类活动。VATEX-EDIT数据集包含34,269个视频和1,057,956个 *(video, command, reference caption, edited caption)* 四元组。遵循原始的VATEX数据集，根据视频id进行training、validation和public test划分。 | #Videos | #Editing instances | | --- --- --- --- --- --- --- - | --- --- --- --- --- --- --- --- -- | | Train / Val / Test | Train / Val / Test | | 25,467 / 2,935 / 5,867 | 784,805 / 91,513 / 181,638 | **Annotation format:** ``` { "vid": video id, "dtype": specific type of editing command, "command": operation (add/delete), "attr": edited attribute, "atype": type of attribute (verb/noun/modifier), "oldcap": reference caption, "reference": positioned reference using <mask> if assigning positions, "newcap": edited caption, } ``` | dtype | global_len_add | global_len_dele | global_attr_add | global_attr_dele | local_len_add | local_len_dele | local_attr_add | | --- --- --- -- | --- --- --- --- --- | --- --- --- --- --- | --- --- --- --- --- --- | --- --- --- --- --- --- | --- --- --- --- --- -- | --- --- --- --- --- -- | --- --- --- --- --- - | | **command** | *<add, - , - >* | *<del, - , - >* | *<add, - , attr >* | *<del, - , attr >* | *<add, pos , - >* | *<del, pos , - >* | *<add,pos,attr>* | **Download** ```python # Use modelscope sdk to load the dataset: (streaming mode) ds = MsDataset.load('modelscope/Video_Description_Editing', subset_name='vatex', split='train', use_streaming=True) for item in ds: print(item) # name <str>: the file base name # path:FILE <str>: local path ``` **[Note]** Due to the legal and privacy concerns, we cannot directly share the original videos from VATEX dataset. You can get the related original videos following the instructions of [VATEX dataset website](https://eric-xw.github.io/vatex-website/download.html). **[注意]** 出于法律和隐私方面的考虑，我们不能直接分享VATEX数据集的原始视频。您可以按照[VATEX数据集网站](https://eric-xw.github.io/vatex-website/download.html)的说明获取相关的原始视频。 **Data Example** <video src="QAgcSr8Khus_000012_000022.mp4"></video> ``` { "vid": "QAgcSr8Khus_000012_000022", "dtype": "local_attr_add", "command": "<add>", "attr": "drop", "atype": "verb", "oldcap": "a man pets a cat that 's there .", "reference": "a man <mask> pets a cat that 's there ." "newcap": ["a man drops some garbage off outdoors and pets a cat that 's there ."], } ``` ## EMMAD-EDIT It is manually collected based on the Chinese E-commerce video captioning dataset [E-MMAD](https://e-mmad.github.io/e-mmad.net/index.html). Given the product video, the original advertising video description and external product information, we further manually annotate video description editing samples. The E-MMAD dataset has overall 23,960 editing instances for 12,295 product videos with two remarkable characteristics, i.e. long videos/descriptions and diverse attributes. The average video length is 27.1 seconds and the average description length is around 100 words. 它是基于中国电商视频字幕数据集[E-MMAD](https://e-mmad.github.io/e-mmad.net/index.html)人工采集的。给定产品视频、原始广告视频描述和外部产品信息，进一步手动标注视频描述编辑样本。 E-MMAD数据集共有12,295个产品视频的23,960个编辑实例，具有两个显著的特点，即长视频/描述和多样化属性。视频平均长度为27.1s，平均描述长度在100字左右。 | | #Videos | #Editing instances | | --- --- --- --- --- -- | --- --- --- --- --- --- --- | --- --- --- --- --- --- --- - | | EMMAD-EDIT | Train / Val / Test | Train / Val / Test | | *specific subset* | 16,176 / 5,418 / 5,502 | 31,610 / 10,586 / 10,737 | | *abstract subset* | 15,955 / 5,328 / 5,432 | 15,959 / 5,328 / 5,432 | **Annotation format:** ``` { "vid": video id, "dtype": specific type of editing command, "command": operation (add/delete), "attr": edited attribute, "atype": type of attribute (specific/abstract), "oldcap": reference caption, "reference": positioned reference using <mask> if assigning positions, "newcap": edited caption, "allattr": product structure information, "video_url": url to download the original video, "video_title": video title, } ``` | dtype | global_len_add | global_len_dele | global_attr_add | global_attr_dele | local_len_add | local_len_dele | local_attr_add | | --- --- --- -- | --- --- --- --- --- | --- --- --- --- --- | --- --- --- --- --- --- | --- --- --- --- --- --- | --- --- --- --- --- -- | --- --- --- --- --- -- | --- --- --- --- --- - | | **command** | *<add, - , - >* | *<del, - , - >* | *<add, - , attr >* | *<del, - , attr >* | *<add, pos , - >* | *<del, pos , - >* | *<add,pos,attr>* | **Download** ```python # Use modelscope sdk to load the dataset: (streaming mode) ds = MsDataset.load('modelscope/Video_Description_Editing', subset_name='emmad', split='train', use_streaming=True) for item in ds: print(item) # name <str>: the file base name # path:FILE <str>: local path ``` We provide the frame clip features and you can download the original video using the video url. The more challenging subset of *abstract attribute* is put under `EMMAD-EDIT/abstract_attr/`. 我们提供frame clip features，您可以使用视频url下载原始视频。更具有挑战性的 *抽象属性* 的子集放在`EMMAD-EDIT/abstract_attr/`下。 **Data Example** <video src="200563409291.mp4"></video> ``` { "vid": 200563409291, "dtype": "local_attr_add", "command": "<add>", "attr": "实用,俏皮", "atype": "specific", "oldcap": "经典水桶包，子母包设计，再现繁盛时代的小鹿包。外形酷似水桶，包身圆润，别致的子母包设计，追求简约、时尚，凸现自我的创意个性与个人色彩。", "reference": "经典水桶包，子母包设计，再现繁盛时代的小鹿包。外形酷似水桶，包身圆润<mask>，别致的子母包设计，追求简约<mask>、时尚，凸现自我的创意个性与个人色彩。", "newcap": "经典水桶包，子母包设计，再现繁盛时代的小鹿包。外形酷似水桶外型，包身圆润又不失俏皮的造型，别致的子母包设计，追求简约实用的时尚，凸现自我的创意个性与个人色彩。" "allattr": "品类:斜挎包,单肩包,水桶包,女包;时间季节:2020;新品:新款;风格:时尚,休闲,简约,潮流,欧美时尚;修饰:手提,小鹿;人群:女士;上市时间:2018年春夏;大小:中;箱包硬度:软;款式:单肩包;里料材质:织物;背包方式:单肩斜挎手提;内部结构:手机袋,证件袋,拉链暗袋;品牌:VANESSA HOGAN;颜色分类:香草白1,黑色,香草白,婴儿粉;皮革材质:牛皮;是否可折叠:否;适用场景:休闲;图案:纯色;质地:牛皮;流行元素:车缝线;货号:VH1804158020405;肩带样式:单根;形状:水桶形;销售渠道类型:商场同款(线上线下都销售);流行款式名称:水桶包;成色:全新;提拎部件类型:软把;闭合方式:磁扣;适用对象:青年;有无夹层:无;", } ``` ## Citation ``` @article{Yao2023VDE, title={Edit As You Wish: Video Description Editing with Multi-grained Commands}, author={Linli Yao and Yuanmeng Zhang and Ziheng Wang and Xinglin Hou and Tiezheng Ge and Yuning Jiang and Qin Jin}, journal={arXiv preprint arXiv:2305.08389}, year={2023} } ```

# 数据集 VATEX-EDIT与EMMAD-EDIT是两个支持**视频描述编辑（Video Description Editing）**任务的数据集。每个数据样本均为四元组 *(video, command, reference caption, edited caption)*。您可在此处下载完整的数据集注释文件与视频特征。所有文件的组织结构如下： dataset/ ├── EMMAD-EDIT │ ├── clip_cn_feats/ │ ├── example_videos/ │ ├── middle_files/ │ ├── abstract_attr/ │ ├── training.json │ ├── validation.json │ └── test.json └── VATEX-EDIT ├── blip_en_feats/ ├── clip_en_feats/ ├── example_videos/ ├── middle_files/ ├── training.json ├── validation.json └── test.json ## VATEX-EDIT 该数据集基于英文视频字幕数据集[VATEX (EN)](https://arxiv.org/pdf/1904.03493.pdf)自动构建，该数据集涵盖了600余种人类活动。VATEX-EDIT数据集包含34,269个视频与1,057,956个四元组 *(video, command, reference caption, edited caption)*。我们遵循原始VATEX数据集的划分规则，根据视频ID对数据进行训练集（training）、验证集（validation）与公开测试集（public test）的拆分。 | 视频数量 | 编辑实例数量 | | :--------------------- | :------------------ | | 训练集/验证集/测试集 | 训练集/验证集/测试集 | | 25,467 / 2,935 / 5,867 | 784,805 / 91,513 / 181,638 | **注释格式：** { "vid": 视频ID, "dtype": 编辑指令的具体类型, "command": 操作类型（添加/删除）, "attr": 待编辑的属性, "atype": 属性类型（动词/名词/修饰词）, "oldcap": 参考字幕, "reference": 若指定位置则使用<mask>标记的定位参考, "newcap": 编辑后的字幕, } | 编辑指令类型(dtype) | 全局长度增加 | 全局长度删除 | 全局属性添加 | 全局属性删除 | 局部长度增加 | 局部长度删除 | 局部属性添加 | | :------------------ | :----------- | :----------- | :----------- | :----------- | :----------- | :----------- | :----------- | | **command** | *<add, - , - >* | *<del, - , - >* | *<add, - , attr >* | *<del, - , attr >* | *<add, pos , - >* | *<del, pos , - >* | *<add,pos,attr>* | **下载方式：** python # 使用ModelScope SDK加载数据集（流式模式）： ds = MsDataset.load('modelscope/Video_Description_Editing', subset_name='vatex', split='train', use_streaming=True) for item in ds: print(item) # name <str>: 文件基础名称 # path:FILE <str>: 本地路径【注意事项】鉴于法律与隐私方面的考量，我们无法直接分享VATEX数据集的原始视频。您可按照[VATEX数据集官网](https://eric-xw.github.io/vatex-website/download.html)的说明获取相关原始视频。 **数据示例** <video src="QAgcSr8Khus_000012_000022.mp4"></video> { "vid": "QAgcSr8Khus_000012_000022", "dtype": "local_attr_add", "command": "<add>", "attr": "drop", "atype": "verb", "oldcap": "a man pets a cat that 's there .", "reference": "a man <mask> pets a cat that 's there .", "newcap": ["a man drops some garbage off outdoors and pets a cat that 's there ."], } ## EMMAD-EDIT 该数据集基于中文电商视频字幕数据集[E-MMAD](https://e-mmad.github.io/e-mmad.net/index.html)人工采集构建。结合产品视频、原始广告视频字幕与外部产品信息，我们进一步人工标注了视频描述编辑样本。 E-MMAD数据集涵盖了12,295个产品视频的23,960个编辑实例，具有两大显著特征：长视频/长字幕与多样化属性。其平均视频时长为27.1秒，平均字幕长度约为100词。 | 数据集子集 | 视频数量 | 编辑实例数量 | | :--------- | :------- | :----------- | | EMMAD-EDIT | 训练集/验证集/测试集 | 训练集/验证集/测试集 | | *特定子集（specific subset）* | 16,176 / 5,418 / 5,502 | 31,610 / 10,586 / 10,737 | | *抽象子集（abstract subset）* | 15,955 / 5,328 / 5,432 | 15,959 / 5,328 / 5,432 | **注释格式：** { "vid": 视频ID, "dtype": 编辑指令的具体类型, "command": 操作类型（添加/删除）, "attr": 待编辑的属性, "atype": 属性类型（具体/抽象）, "oldcap": 参考字幕, "reference": 若指定位置则使用<mask>标记的定位参考, "newcap": 编辑后的字幕, "allattr": 产品结构信息, "video_url": 原始视频下载链接, "video_title": 视频标题, } | 编辑指令类型(dtype) | 全局长度增加 | 全局长度删除 | 全局属性添加 | 全局属性删除 | 局部长度增加 | 局部长度删除 | 局部属性添加 | | :------------------ | :----------- | :----------- | :----------- | :----------- | :----------- | :----------- | :----------- | | **command** | *<add, - , - >* | *<del, - , - >* | *<add, - , attr >* | *<del, - , attr >* | *<add, pos , - >* | *<del, pos , - >* | *<add,pos,attr>* | **下载方式：** python # 使用ModelScope SDK加载数据集（流式模式）： ds = MsDataset.load('modelscope/Video_Description_Editing', subset_name='emmad', split='train', use_streaming=True) for item in ds: print(item) # name <str>: 文件基础名称 # path:FILE <str>: 本地路径我们提供帧剪辑特征（frame clip features），您可通过视频下载链接获取原始视频。更具挑战性的*抽象属性子集（abstract attribute subset）*被放置在`EMMAD-EDIT/abstract_attr/`目录下。 **数据示例** <video src="200563409291.mp4"></video> { "vid": 200563409291, "dtype": "local_attr_add", "command": "<add>", "attr": "实用,俏皮", "atype": "specific", "oldcap": "经典水桶包，子母包设计，再现繁盛时代的小鹿包。外形酷似水桶，包身圆润，别致的子母包设计，追求简约、时尚，凸现自我的创意个性与个人色彩。", "reference": "经典水桶包，子母包设计，再现繁盛时代的小鹿包。外形酷似水桶<mask>，别致的子母包设计，追求简约<mask>、时尚，凸现自我的创意个性与个人色彩.", "newcap": "经典水桶包，子母包设计，再现繁盛时代的小鹿包。外形酷似水桶外型，包身圆润又不失俏皮的造型，别致的子母包设计，追求简约实用的时尚，凸现自我的创意个性与个人色彩.", "allattr": "品类:斜挎包,单肩包,水桶包,女包;时间季节:2020;新品:新款;风格:时尚,休闲,简约,潮流,欧美时尚;修饰:手提,小鹿;人群:女士;上市时间:2018年春夏;大小:中;箱包硬度:软;款式:单肩包;里料材质:织物;背包方式:单肩斜挎手提;内部结构:手机袋,证件袋,拉链暗袋;品牌:VANESSA HOGAN;颜色分类:香草白1,黑色,香草白,婴儿粉;皮革材质:牛皮;是否可折叠:否;适用场景:休闲;图案:纯色;质地:牛皮;流行元素:车缝线;货号:VH1804158020405;肩带样式:单根;形状:水桶形;销售渠道类型:商场同款(线上线下都销售);流行款式名称:水桶包;成色:全新;提拎部件类型:软把;闭合方式:磁扣;适用对象:青年;有无夹层:无;", } ## 引用格式 @article{Yao2023VDE, title={Edit As You Wish: Video Description Editing with Multi-grained Commands}, author={Linli Yao and Yuanmeng Zhang and Ziheng Wang and Xinglin Hou and Tiezheng Ge and Yuning Jiang and Qin Jin}, journal={arXiv preprint arXiv:2305.08389}, year={2023} }

提供机构：

maas

创建时间：

2023-10-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集