panda-70m
收藏魔搭社区2026-05-16 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/panda-70m
下载链接
链接失效反馈官方服务:
资源简介:
### Clone with HTTP
```bash
git clone https://www.modelscope.cn/datasets/AI-ModelScope/panda-70m.git
```
### Load with SDK
由于视频来源是youtube,ModelScope平台方没有版权来托管和发布这些视频内容,故数据中只保留了url等元信息,需要用户自行下载,具体方式如下:
#### 1. 安装youtube下载工具
```python
pip3 install --upgrade --force-reinstall "git+https://github.com/ytdl-org/youtube-dl.git"
```
#### 2. 使用sdk加载数据并执行下载动作
```python
import os
from functools import partial
from modelscope import MsDataset
# Define mapping function
def download_url(example: dict, output_dir: str):
output_dir = os.path.expanduser(output_dir)
video_id: str = example['videoID']
video_url: str = example['url']
video_path: str = os.path.join(output_dir, f'{video_id}')
if os.path.exists(video_path):
print(f'** Reusing video {video_path}')
example['video_path'] = video_path
return example
os.system(
f"youtube-dl -o '{video_path}' {video_url}")
example['video_path'] = video_path
return example
download_url_func = partial(download_url, output_dir='/path/to/your_data')
# Use sdk to load meta-data and download videos
ds = MsDataset.load('AI-ModelScope/panda-70m', subset_name='default', split='validation').to_hf_dataset()
ds = ds.map(download_url_func, num_proc=4)
print(next(iter(ds)))
```
### 使用HTTP克隆
bash
git clone https://www.modelscope.cn/datasets/AI-ModelScope/panda-70m.git
### 使用SDK加载数据集
由于视频来源为YouTube(youtube),ModelScope平台不具备该类视频的托管与发布版权,故数据集仅保留URL等元数据信息,需用户自行完成视频下载,具体操作流程如下:
#### 1. 安装YouTube下载工具
python
pip3 install --upgrade --force-reinstall "git+https://github.com/ytdl-org/youtube-dl.git"
#### 2. 使用SDK加载数据集并执行下载操作
python
import os
from functools import partial
from modelscope import MsDataset
# 定义下载映射函数
def download_url(example: dict, output_dir: str):
output_dir = os.path.expanduser(output_dir)
video_id: str = example['videoID']
video_url: str = example['url']
video_path: str = os.path.join(output_dir, f'{video_id}')
if os.path.exists(video_path):
print(f'** 复用视频 {video_path}')
example['video_path'] = video_path
return example
os.system(
f"youtube-dl -o '{video_path}' {video_url}")
example['video_path'] = video_path
return example
download_url_func = partial(download_url, output_dir='/path/to/your_data')
# 使用SDK加载元数据并下载视频
ds = MsDataset.load('AI-ModelScope/panda-70m', subset_name='default', split='validation').to_hf_dataset()
ds = ds.map(download_url_func, num_proc=4)
print(next(iter(ds)))
提供机构:
maas
创建时间:
2024-03-06



