yongchanskii/youtube-data-various-domain
收藏Hugging Face2024-06-14 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/yongchanskii/youtube-data-various-domain
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: source
dtype: string
- name: channelName
dtype: string
- name: category
dtype: string
- name: title
dtype: string
- name: videoId
dtype: string
- name: domainTag
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: transcriptionPath
dtype: string
- name: start
dtype: float64
- name: end
dtype: float64
- name: WER
dtype: float64
- name: CER
dtype: float64
- name: hypotheseText
dtype: string
- name: referenceText
dtype: string
- name: referenceTextLength
dtype: int64
- name: hypotheseTextLength
dtype: int64
splits:
- name: train
num_bytes: 555931731.52
num_examples: 2288
- name: test
num_bytes: 137842029.0
num_examples: 572
download_size: 691274587
dataset_size: 693773760.52
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
---
# Dataset Card for "youtube-data-various-domain"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
This dataset includes multiple features such as source, channelName, category, title, videoId, etc., each with its data type. The dataset is divided into a training set and a test set, containing 2288 and 572 samples respectively. The size and download size of the dataset are also clearly recorded. The configuration name of the dataset is default, including the paths of training and test data.
提供机构:
yongchanskii
原始信息汇总
数据集概述
当前数据集详情页面提供的README文件内容为:
More Information needed
由于信息不足,无法提供进一步的数据集详情。



