five

Sreevardhan1729/ActivityNet_Captions

收藏
Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Sreevardhan1729/ActivityNet_Captions
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: train path: "activitynet_captions_train.json" - split: val1 path: "activitynet_captions_val1.json" - split: val2 path: "activitynet_captions_val2.json" task_categories: - text-to-video - text-retrieval - video-classification language: - en size_categories: - 10K<n<100K --- ## About [ActivityNet Captions](https://openaccess.thecvf.com/content_iccv_2017/html/Krishna_Dense-Captioning_Events_in_ICCV_2017_paper.html) contains 20K long-form videos (180s as average length) from YouTube and 100K captions. Most of the videos contain over 3 annotated events. We follow the existing works to concatenate multiple short temporal descriptions into long sentences and evaluate ‘paragraph-to-video’ retrieval on this benchmark. We adopt the official split: - **Train:** 10,009 videos, 10,009 captions (concatenate from 37,421 short captions) - **Test (Val1):** 4,917 videos, 4,917 captions (concatenate from 17,505 short captions) - **Val2:** 4,885 videos, 4,885 captions (concatenate from 17,031 short captions) --- ## Get Raw Videos ```bash cat ActivityNet_Videos.tar.part-* | tar -vxf - ``` --- ## Official Release ActivityNet Official Release: [ActivityNet Download](http://activity-net.org/download.html) --- ## 🌟 Citation ```bibtex @inproceedings{caba2015activitynet, title={Activitynet: A large-scale video benchmark for human activity understanding}, author={Caba Heilbron, Fabian and Escorcia, Victor and Ghanem, Bernard and Carlos Niebles, Juan}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2015} } ```

--- configs: - 配置名称: default 数据文件: - 数据集划分: train 文件路径: "activitynet_captions_train.json" - 数据集划分: val1 文件路径: "activitynet_captions_val1.json" - 数据集划分: val2 文件路径: "activitynet_captions_val2.json" task_categories: - 文本到视频(text-to-video) - 文本检索(text-retrieval) - 视频分类(video-classification) language: - 英语(en) size_categories: - 10000 < 样本数 < 100000 --- ## 关于 [ActivityNet字幕数据集(ActivityNet Captions)](https://openaccess.thecvf.com/content_iccv_2017/html/Krishna_Dense-Captioning_Events_in_ICCV_2017_paper.html) 包含来自YouTube的2万个长视频(平均时长180秒)与10万条字幕,其中绝大多数视频包含3个以上标注事件。我们沿用现有研究的设定,将多条短时序描述拼接为长句,并在该基准数据集上开展段落到视频检索任务的评估。 我们采用官方划分方式: - **训练集:** 10009个视频、10009条字幕(由37421条短字幕拼接而成) - **测试集(Val1):** 4917个视频、4917条字幕(由17505条短字幕拼接而成) - **Val2集:** 4885个视频、4885条字幕(由17031条短字幕拼接而成) --- ## 获取原始视频 bash cat ActivityNet_Videos.tar.part-* | tar -vxf - --- ## 官方发布 ActivityNet 官方发布地址:[ActivityNet 下载页面](http://activity-net.org/download.html) --- ## 🌟 引用文献 bibtex @inproceedings{caba2015activitynet, title={ActivityNet:面向人类行为理解的大规模视频基准数据集}, author={Caba Heilbron, Fabian 与 Escorcia, Victor 与 Ghanem, Bernard 与 Carlos Niebles, Juan}, booktitle={IEEE/CVF 计算机视觉与模式识别会议(CVPR)论文集}, year={2015} }
提供机构:
Sreevardhan1729
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作