pseudolabel-science-large-v3-timestamp
收藏魔搭社区2025-11-12 更新2025-10-11 收录
下载链接:
https://modelscope.cn/datasets/mesolitica/pseudolabel-science-large-v3-timestamp
下载链接
链接失效反馈官方服务:
资源简介:
# Pseudolabel science context audio using Whisper Large V3
Original audio from [malaysia-ai/science-context-youtube](https://huggingface.co/datasets/malaysia-ai/science-context-youtube), we split every 30 seconds and pseudolabelled using Whisper Large V3.
## how to prepare the dataset
```bash
huggingface-cli download --repo-type dataset \
--include 'science-chunk-*.zip' \
--local-dir './' \
--max-workers 20 \
mesolitica/pseudolabel-science-large-v3-timestamp
wget https://gist.githubusercontent.com/huseinzol05/2e26de4f3b29d99e993b349864ab6c10/raw/9b2251f3ff958770215d70c8d82d311f82791b78/unzip.py
python3 unzip.py
```
# 基于Whisper Large V3为科学语境音频生成伪标签
本数据集的原始音频源自[malaysia-ai/science-context-youtube](https://huggingface.co/datasets/malaysia-ai/science-context-youtube),我们将音频按每30秒为单位进行切片,并使用Whisper Large V3完成伪标注。
## 数据集制备流程
bash
huggingface-cli download --repo-type dataset
--include 'science-chunk-*.zip'
--local-dir './'
--max-workers 20
mesolitica/pseudolabel-science-large-v3-timestamp
wget https://gist.githubusercontent.com/huseinzol05/2e26de4f3b29d99e993b349864ab6c10/raw/9b2251f3ff958770215d70c8d82d311f82791b78/unzip.py
python3 unzip.py
提供机构:
maas
创建时间:
2025-10-04



