Sreevardhan1729/ActivityNet_Captions
收藏Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Sreevardhan1729/ActivityNet_Captions
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: "activitynet_captions_train.json"
- split: val1
path: "activitynet_captions_val1.json"
- split: val2
path: "activitynet_captions_val2.json"
task_categories:
- text-to-video
- text-retrieval
- video-classification
language:
- en
size_categories:
- 10K<n<100K
---
## About
[ActivityNet Captions](https://openaccess.thecvf.com/content_iccv_2017/html/Krishna_Dense-Captioning_Events_in_ICCV_2017_paper.html) contains 20K long-form videos (180s as average length) from YouTube and 100K captions. Most of the videos contain over 3 annotated events. We follow the existing works to concatenate multiple short temporal descriptions into long sentences and evaluate ‘paragraph-to-video’ retrieval on this benchmark.
We adopt the official split:
- **Train:** 10,009 videos, 10,009 captions (concatenate from 37,421 short captions)
- **Test (Val1):** 4,917 videos, 4,917 captions (concatenate from 17,505 short captions)
- **Val2:** 4,885 videos, 4,885 captions (concatenate from 17,031 short captions)
---
## Get Raw Videos
```bash
cat ActivityNet_Videos.tar.part-* | tar -vxf -
```
---
## Official Release
ActivityNet Official Release: [ActivityNet Download](http://activity-net.org/download.html)
---
## 🌟 Citation
```bibtex
@inproceedings{caba2015activitynet,
title={Activitynet: A large-scale video benchmark for human activity understanding},
author={Caba Heilbron, Fabian and Escorcia, Victor and Ghanem, Bernard and Carlos Niebles, Juan},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2015}
}
```
---
configs:
- 配置名称: default
数据文件:
- 数据集划分: train
文件路径: "activitynet_captions_train.json"
- 数据集划分: val1
文件路径: "activitynet_captions_val1.json"
- 数据集划分: val2
文件路径: "activitynet_captions_val2.json"
task_categories:
- 文本到视频(text-to-video)
- 文本检索(text-retrieval)
- 视频分类(video-classification)
language:
- 英语(en)
size_categories:
- 10000 < 样本数 < 100000
---
## 关于
[ActivityNet字幕数据集(ActivityNet Captions)](https://openaccess.thecvf.com/content_iccv_2017/html/Krishna_Dense-Captioning_Events_in_ICCV_2017_paper.html) 包含来自YouTube的2万个长视频(平均时长180秒)与10万条字幕,其中绝大多数视频包含3个以上标注事件。我们沿用现有研究的设定,将多条短时序描述拼接为长句,并在该基准数据集上开展段落到视频检索任务的评估。
我们采用官方划分方式:
- **训练集:** 10009个视频、10009条字幕(由37421条短字幕拼接而成)
- **测试集(Val1):** 4917个视频、4917条字幕(由17505条短字幕拼接而成)
- **Val2集:** 4885个视频、4885条字幕(由17031条短字幕拼接而成)
---
## 获取原始视频
bash
cat ActivityNet_Videos.tar.part-* | tar -vxf -
---
## 官方发布
ActivityNet 官方发布地址:[ActivityNet 下载页面](http://activity-net.org/download.html)
---
## 🌟 引用文献
bibtex
@inproceedings{caba2015activitynet,
title={ActivityNet:面向人类行为理解的大规模视频基准数据集},
author={Caba Heilbron, Fabian 与 Escorcia, Victor 与 Ghanem, Bernard 与 Carlos Niebles, Juan},
booktitle={IEEE/CVF 计算机视觉与模式识别会议(CVPR)论文集},
year={2015}
}
提供机构:
Sreevardhan1729



