five

Zarakun/youtube_ua_subtitles_test

收藏
Hugging Face2024-01-17 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Zarakun/youtube_ua_subtitles_test
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - automatic-speech-recognition pretty_name: MangoSpeech configs: - config_name: rozdympodcast data_files: "data/rozdympodcast.parquet" - config_name: opodcast data_files: "data/opodcast.parquet" - config_name: test data_files: "data/test.parquet" --- # The list of all subsets in the dataset Each subset is generated splitting videos from given particular ukrainiam YouTube channel All subsets are in test split - "opodcast" subset is from channel "О! ПОДКАСТ" - "rozdympodcast" subset is from channel "Роздум | Подкаст" - "test" subset is just a small subset of samples # Loading a particular subset ``` >>> data_files = {"train": "data/<your_subset>.parquet"} >>> data = load_dataset("Zarakun/youtube_ua_subtitles_test", data_files=data_files) >>> data DatasetDict({ train: Dataset({ features: ['audio', 'rate', 'duration', 'sentence'], num_rows: <some_number> }) }) ```
提供机构:
Zarakun
原始信息汇总

数据集概述

任务类别

  • 自动语音识别(automatic-speech-recognition)

数据集名称

  • MangoSpeech

配置信息

  • config_name: rozdympodcast
    • 数据文件: data/rozdympodcast.parquet
  • config_name: opodcast
    • 数据文件: data/opodcast.parquet
  • config_name: test
    • 数据文件: data/test.parquet

子集信息

  • opodcast 子集来自频道 "О! ПОДКАСТ"
  • rozdympodcast 子集来自频道 "Роздум | Подкаст"
  • test 子集是一个小样本子集

数据加载

  • 数据文件路径: data/<your_subset>.parquet

  • 加载命令: python data_files = {"train": "data/<your_subset>.parquet"} data = load_dataset("Zarakun/youtube_ua_subtitles_test", data_files=data_files)

  • 数据结构: python DatasetDict({ train: Dataset({ features: [audio, rate, duration, sentence], num_rows: <some_number> }) })

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作