Sreevardhan1729/ActivityNet_Captions

Name: Sreevardhan1729/ActivityNet_Captions
Creator: Sreevardhan1729
Published: 2026-04-01 16:41:55
License: 暂无描述

Hugging Face2026-04-01 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/Sreevardhan1729/ActivityNet_Captions

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: default data_files: - split: train path: "activitynet_captions_train.json" - split: val1 path: "activitynet_captions_val1.json" - split: val2 path: "activitynet_captions_val2.json" task_categories: - text-to-video - text-retrieval - video-classification language: - en size_categories: - 10K<n<100K --- ## About [ActivityNet Captions](https://openaccess.thecvf.com/content_iccv_2017/html/Krishna_Dense-Captioning_Events_in_ICCV_2017_paper.html) contains 20K long-form videos (180s as average length) from YouTube and 100K captions. Most of the videos contain over 3 annotated events. We follow the existing works to concatenate multiple short temporal descriptions into long sentences and evaluate ‘paragraph-to-video’ retrieval on this benchmark. We adopt the official split: - **Train:** 10,009 videos, 10,009 captions (concatenate from 37,421 short captions) - **Test (Val1):** 4,917 videos, 4,917 captions (concatenate from 17,505 short captions) - **Val2:** 4,885 videos, 4,885 captions (concatenate from 17,031 short captions) --- ## Get Raw Videos ```bash cat ActivityNet_Videos.tar.part-* | tar -vxf - ``` --- ## Official Release ActivityNet Official Release: [ActivityNet Download](http://activity-net.org/download.html) --- ## 🌟 Citation ```bibtex @inproceedings{caba2015activitynet, title={Activitynet: A large-scale video benchmark for human activity understanding}, author={Caba Heilbron, Fabian and Escorcia, Victor and Ghanem, Bernard and Carlos Niebles, Juan}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2015} } ```

--- configs: - 配置名称: default 数据文件: - 数据集划分: train 文件路径: "activitynet_captions_train.json" - 数据集划分: val1 文件路径: "activitynet_captions_val1.json" - 数据集划分: val2 文件路径: "activitynet_captions_val2.json" task_categories: - 文本到视频（text-to-video） - 文本检索（text-retrieval） - 视频分类（video-classification） language: - 英语（en） size_categories: - 10000 < 样本数 < 100000 --- ## 关于 [ActivityNet字幕数据集（ActivityNet Captions）](https://openaccess.thecvf.com/content_iccv_2017/html/Krishna_Dense-Captioning_Events_in_ICCV_2017_paper.html) 包含来自YouTube的2万个长视频（平均时长180秒）与10万条字幕，其中绝大多数视频包含3个以上标注事件。我们沿用现有研究的设定，将多条短时序描述拼接为长句，并在该基准数据集上开展段落到视频检索任务的评估。我们采用官方划分方式： - **训练集：** 10009个视频、10009条字幕（由37421条短字幕拼接而成） - **测试集（Val1）：** 4917个视频、4917条字幕（由17505条短字幕拼接而成） - **Val2集：** 4885个视频、4885条字幕（由17031条短字幕拼接而成） --- ## 获取原始视频 bash cat ActivityNet_Videos.tar.part-* | tar -vxf - --- ## 官方发布 ActivityNet 官方发布地址：[ActivityNet 下载页面](http://activity-net.org/download.html) --- ## 🌟 引用文献 bibtex @inproceedings{caba2015activitynet, title={ActivityNet：面向人类行为理解的大规模视频基准数据集}, author={Caba Heilbron, Fabian 与 Escorcia, Victor 与 Ghanem, Bernard 与 Carlos Niebles, Juan}, booktitle={IEEE/CVF 计算机视觉与模式识别会议（CVPR）论文集}, year={2015} }

提供机构：

Sreevardhan1729

5,000+

优质数据集

54 个

任务类型

进入经典数据集