Zarakun/youtube_ua_subtitles_test
收藏Hugging Face2024-01-17 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Zarakun/youtube_ua_subtitles_test
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- automatic-speech-recognition
pretty_name: MangoSpeech
configs:
- config_name: rozdympodcast
data_files: "data/rozdympodcast.parquet"
- config_name: opodcast
data_files: "data/opodcast.parquet"
- config_name: test
data_files: "data/test.parquet"
---
# The list of all subsets in the dataset
Each subset is generated splitting videos from given particular ukrainiam YouTube channel
All subsets are in test split
- "opodcast" subset is from channel "О! ПОДКАСТ"
- "rozdympodcast" subset is from channel "Роздум | Подкаст"
- "test" subset is just a small subset of samples
# Loading a particular subset
```
>>> data_files = {"train": "data/<your_subset>.parquet"}
>>> data = load_dataset("Zarakun/youtube_ua_subtitles_test", data_files=data_files)
>>> data
DatasetDict({
train: Dataset({
features: ['audio', 'rate', 'duration', 'sentence'],
num_rows: <some_number>
})
})
```
提供机构:
Zarakun
原始信息汇总
数据集概述
任务类别
- 自动语音识别(automatic-speech-recognition)
数据集名称
- MangoSpeech
配置信息
- config_name: rozdympodcast
- 数据文件:
data/rozdympodcast.parquet
- 数据文件:
- config_name: opodcast
- 数据文件:
data/opodcast.parquet
- 数据文件:
- config_name: test
- 数据文件:
data/test.parquet
- 数据文件:
子集信息
- opodcast 子集来自频道 "О! ПОДКАСТ"
- rozdympodcast 子集来自频道 "Роздум | Подкаст"
- test 子集是一个小样本子集
数据加载
-
数据文件路径:
data/<your_subset>.parquet -
加载命令: python data_files = {"train": "data/<your_subset>.parquet"} data = load_dataset("Zarakun/youtube_ua_subtitles_test", data_files=data_files)
-
数据结构: python DatasetDict({ train: Dataset({ features: [audio, rate, duration, sentence], num_rows: <some_number> }) })



