Youku_Dense_Caption

Name: Youku_Dense_Caption
Creator: maas
Published: 2026-05-16 22:50:35
License: 暂无描述

魔搭社区2026-05-16 更新2025-02-22 收录

下载链接：

https://modelscope.cn/datasets/os_ai/Youku_Dense_Caption

下载链接

链接失效反馈

官方服务：

资源简介：

# Youku Dense Caption Dataset 🎥 <div align="center"> ![License](https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-blue.svg) ![Videos](https://img.shields.io/badge/Videos-31.4K-green) ![Captions](https://img.shields.io/badge/Captions-311.9K-orange) ![Language](https://img.shields.io/badge/Language-Chinese-red) </div> ## 📊 Dataset Overview A comprehensive collection of Chinese video captions from Youku (优酷), featuring: - **📹 Videos**: 31,466 complete short videos - **✍️ Captions**: 311,921 Chinese captions - **🈺 Language**: Chinese - **📱 Source**: Youku Platform (优酷) ## 🚀 Usage The dataset is available for download from [ModelScope](https://modelscope.cn/datasets/os_ai/Youku_Dense_Caption/). ### 1. Dataset Download ⬇️ ```bash # Install Git LFS git lfs install # Clone the dataset git lfs clone https://oauth2:your_git_token@www.modelscope.cn/datasets/os_ai/Youku_Dense_Caption.git ``` > 🔑 **Get Token**: Visit https://modelscope.cn/my/myaccesstoken ### 2. Dataset Structure 📁 #### 📌 benchmark_files/ Specialized benchmark data collections: - 🎯 Video caption generation task data - 📍 Video moment retrieval task data #### 📌 meta_files/ Core dataset metadata: - 📝 Video category information - 🔗 Video file paths - 💬 Complete caption text #### 📌 data_files/ Main data storage, organized by categories: ``` data_files/ ├── Agriculture/ │ ├── train/ (zipped) │ ├── val/ (zipped) │ └── test/ (ready for preview) ├── Children/ └── ... ``` ### 3. Usage Guide 📖 1. **After Download**: - Navigate to target category folder - Example: `cd data_files/Agriculture` 2. **Data Preparation**: - Unzip files in train/ and val/ directories - Files in test/ directory are ready to use > ⚠️ **Important Notes**: > - train and val data are stored in compressed format, requiring extraction > - test data is directly accessible for preview and testing --- 💡 For questions, please refer to project documentation or submit an Issue ## 📚 Citation If you use this dataset in your research, please cite: ```bibtex @inproceedings{xiong2025youku, title={Youku Dense Caption: A Large-scale Chinese Video Dense Caption Dataset and Benchmarks}, author={Zixuan Xiong, Guangwei Xu, Wenkai Zhang, Yuan Miao, Xuan Wu, LinHai, Ruijie Guo, Hai-Tao Zheng}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2025}, url={https://openreview.net/forum?id=vvi5OjPhbu} } ``` ## 📄 License This dataset is released under the [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. --- <div align="center"> ⭐ Star us on GitHub if you find this dataset useful! ⭐ </div>

# 优酷密集字幕数据集（Youku Dense Caption Dataset） 🎥 <div align="center"> ![许可证](https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-blue.svg) ![视频数量](https://img.shields.io/badge/Videos-31.4K-green) ![字幕数量](https://img.shields.io/badge/Captions-311.9K-orange) ![语言](https://img.shields.io/badge/Language-Chinese-red) </div> ## 📊 数据集概览本数据集为来自优酷（Youku）的中文视频字幕综合合集，包含以下内容： - **📹 视频**：31466条完整短视频 - **✍️ 字幕**：311921条中文字幕 - **🈺 语言**：中文 - **📱 来源**：优酷平台 ## 🚀 使用方式本数据集可从魔搭社区（ModelScope）下载，地址为：[ModelScope](https://modelscope.cn/datasets/os_ai/Youku_Dense_Caption/)。 ### 1. 数据集下载 ⬇️ bash # 安装Git LFS git lfs install # 克隆数据集 git lfs clone https://oauth2:your_git_token@www.modelscope.cn/datasets/os_ai/Youku_Dense_Caption.git > 🔑 **获取访问令牌**：访问 https://modelscope.cn/my/myaccesstoken ### 2. 数据集结构 📁 #### 📌 benchmark_files/ 专业化基准数据集集合： - 🎯 视频字幕生成任务数据集 - 📍 视频时序检索任务数据集 #### 📌 meta_files/ 核心数据集元数据： - 📝 视频分类信息 - 🔗 视频文件路径 - 💬 完整字幕文本 #### 📌 data_files/ 主数据存储目录，按类别组织： data_files/ ├── Agriculture/ │ ├── train/ (已压缩) │ ├── val/ (已压缩) │ └── test/ (可直接预览) ├── Children/ └── ... ### 3. 使用指南 📖 1. **下载完成后**： - 进入目标分类文件夹 - 示例命令：`cd data_files/Agriculture` 2. **数据准备**： - 解压train/与val/目录下的压缩文件 - test/目录下的文件可直接使用 > ⚠️ **重要提示**： > - train与val数据以压缩格式存储，需解压后方可使用 > - test数据可直接用于预览与测试 --- 💡 如需咨询，请参阅项目文档或提交Issue ## 📚 引用格式如果您在研究中使用本数据集，请引用以下文献： bibtex @inproceedings{xiong2025youku, title={优酷密集字幕：大规模中文视频密集字幕数据集与基准测试集}, author={Zixuan Xiong, Guangwei Xu, Wenkai Zhang, Yuan Miao, Xuan Wu, LinHai, Ruijie Guo, Hai-Tao Zheng}, booktitle={第十三届国际学习表征会议}, year={2025}, url={https://openreview.net/forum?id=vvi5OjPhbu} } ## 📄 许可证本数据集采用[CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)许可协议发布。 --- <div align="center"> ⭐ 如果您认为本数据集对您有帮助，请在GitHub上为我们点亮Star！ ⭐ </div>

提供机构：

maas

创建时间：

2025-05-07

搜集汇总

数据集介绍