CLAPv2/vggsound_formatted_batch_4
收藏Hugging Face2024-09-19 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/CLAPv2/vggsound_formatted_batch_4
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: __key__
dtype: string
- name: __url__
dtype: string
- name: flac
dtype: audio
- name: json
struct:
- name: original_data
struct:
- name: description
dtype: string
- name: filename
dtype: string
- name: label
dtype: string
- name: license
dtype: string
- name: split
dtype: string
- name: start
dtype: int64
- name: title
dtype: string
- name: url
dtype: string
- name: tag
sequence: string
- name: text
sequence: string
- name: index
dtype: string
- name: datasetname
dtype: string
- name: audio
struct:
- name: array
sequence: float64
- name: path
dtype: string
- name: sampling_rate
dtype: int64
- name: text
dtype: string
- name: raw_text
sequence:
sequence: string
- name: audio_len
dtype: int64
splits:
- name: train
num_bytes: 47911721496.0
num_examples: 10000
download_size: 18702790280
dataset_size: 47911721496.0
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
数据集信息:
## 特征字段
1. `__key__`:数据类型为字符串(string)
2. `__url__`:数据类型为字符串
3. `flac`:数据类型为音频(audio)
4. `json`:结构体类型,包含以下子字段:
- `original_data`:结构体类型,包含以下子字段:
- `description`:字符串类型
- `filename`:字符串类型
- `label`:字符串类型
- `license`:字符串类型
- `split`:字符串类型
- `start`:int64类型
- `title`:字符串类型
- `url`:字符串类型
- `tag`:字符串序列(sequence<string>)
- `text`:字符串序列(sequence<string>)
5. `index`:数据类型为字符串
6. `datasetname`:数据类型为字符串
7. `audio`:结构体类型,包含以下子字段:
- `array`:float64序列(sequence<float64>)
- `path`:字符串类型
- `sampling_rate`:int64类型(采样率)
8. `text`:数据类型为字符串
9. `raw_text`:嵌套字符串序列(sequence<sequence<string>>)
10. `audio_len`:数据类型为int64(音频长度)
## 数据划分
- 训练集(train):字节占用量为47911721496.0,样本总数为10000
- 下载总大小:18702790280字节
- 数据集总大小:47911721496.0字节
## 配置项
- 默认配置(default):数据文件对应训练集划分,文件路径为`data/train-*`
提供机构:
CLAPv2



