Masioki/SLUE-processed
收藏Hugging Face2024-06-06 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/Masioki/SLUE-processed
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
dataset_info:
config_name: hvb
features:
- name: utt_index
dtype: int32
- name: channel
dtype: int32
- name: role
dtype: string
- name: start_ms
dtype: int32
- name: duration_ms
dtype: int32
- name: intent
dtype: string
- name: dialog_acts
sequence: string
- name: distilbert-uncased-embeddings
sequence:
sequence: float32
- name: Phi-3-mini-embeddings
sequence:
sequence: float32
- name: log_pitch_pov
sequence: float32
- name: log_pitch_der
sequence: float32
- name: log_total_e
sequence: float32
- name: log_total_e_lower_bands
sequence: float32
- name: log_total_e_upper_bands
sequence: float32
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: speaker
dtype: string
- name: conversation
dtype: string
splits:
- name: train
num_bytes: 2413918025.28
num_examples: 11344
- name: validation
num_bytes: 348479898.3
num_examples: 1690
- name: test
num_bytes: 1277200426.27
num_examples: 6121
- name: asr_train
num_bytes: 2560385950.28
num_examples: 11344
- name: asr_validation
num_bytes: 373336758.3
num_examples: 1690
- name: asr_test
num_bytes: 1343115200.27
num_examples: 6121
download_size: 8721365503
dataset_size: 8316436258.700001
configs:
- config_name: hvb
data_files:
- split: train
path: hvb/train-*
- split: validation
path: hvb/validation-*
- split: test
path: hvb/test-*
- split: asr_train
path: hvb/asr_train-*
- split: asr_validation
path: hvb/asr_validation-*
- split: asr_test
path: hvb/asr_test-*
---
提供机构:
Masioki
原始信息汇总
数据集概述
数据集配置
- 配置名称: hvb
数据集特征
- utt_index: 整数型 (int32)
- channel: 整数型 (int32)
- role: 字符串型 (string)
- start_ms: 整数型 (int32)
- duration_ms: 整数型 (int32)
- intent: 字符串型 (string)
- dialog_acts: 字符串序列
- distilbert-uncased-embeddings: 浮点数序列 (float32)
- Phi-3-mini-embeddings: 浮点数序列 (float32)
- log_pitch_pov: 浮点数序列 (float32)
- log_pitch_der: 浮点数序列 (float32)
- log_total_e: 浮点数序列 (float32)
- log_total_e_lower_bands: 浮点数序列 (float32)
- log_total_e_upper_bands: 浮点数序列 (float32)
- audio: 音频数据,采样率为16000
- text: 字符串型 (string)
- speaker: 字符串型 (string)
- conversation: 字符串型 (string)
数据集分割
- train: 11344个样本,大小为2413918025.28字节
- validation: 1690个样本,大小为348479898.3字节
- test: 6121个样本,大小为1277200426.27字节
- asr_train: 11344个样本,大小为2560385950.28字节
- asr_validation: 1690个样本,大小为373336758.3字节
- asr_test: 6121个样本,大小为1343115200.27字节
数据集大小
- 下载大小: 8721365503字节
- 数据集大小: 8316436258.700001字节
数据文件路径
- train: hvb/train-*
- validation: hvb/validation-*
- test: hvb/test-*
- asr_train: hvb/asr_train-*
- asr_validation: hvb/asr_validation-*
- asr_test: hvb/asr_test-*



