SpeechPPL/SALMon_Spirit-LM-Expressive-normalized2

Name: SpeechPPL/SALMon_Spirit-LM-Expressive-normalized2
Creator: SpeechPPL
Published: 2026-04-10 13:59:49
License: 暂无描述

Hugging Face2026-04-10 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/SpeechPPL/SALMon_Spirit-LM-Expressive-normalized2

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: - config_name: bg_alignment features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: negative_sample_tokenwise_loss sequence: float32 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: model_sampling_rate dtype: int64 - name: ppl_sanity dtype: int64 - name: positive_sample_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: negative_sample_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: model_generated_continuation dtype: audio: sampling_rate: 16000 splits: - name: train num_bytes: 86708136 num_examples: 200 download_size: 86708136 dataset_size: 86708136 - config_name: bg_all_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: negative_sample_tokenwise_loss sequence: float32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_tokenwise_loss sequence: float32 - name: prompt_sample_tokenwise_loss sequence: float32 - name: model_generated_continuation dtype: audio: sampling_rate: 16000 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: model_sampling_rate dtype: int64 - name: ppl_sanity dtype: int64 - name: positive_sample_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: negative_sample_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: positive_continuation_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: negative_continuation_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string splits: - name: train num_bytes: 222443312 num_examples: 200 download_size: 222443312 dataset_size: 222443312 - config_name: bg_domain_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: negative_sample_tokenwise_loss sequence: float32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_tokenwise_loss sequence: float32 - name: prompt_sample_tokenwise_loss sequence: float32 - name: model_generated_continuation dtype: audio: sampling_rate: 16000 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: model_sampling_rate dtype: int64 - name: ppl_sanity dtype: int64 - name: positive_sample_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: negative_sample_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: positive_continuation_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: negative_continuation_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string splits: - name: train num_bytes: 226172124 num_examples: 200 download_size: 226172124 dataset_size: 226172124 - config_name: gender_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: negative_sample_tokenwise_loss sequence: float32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_tokenwise_loss sequence: float32 - name: prompt_sample_tokenwise_loss sequence: float32 - name: model_generated_continuation dtype: audio: sampling_rate: 16000 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: model_sampling_rate dtype: int64 - name: ppl_sanity dtype: int64 - name: positive_sample_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: negative_sample_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: positive_continuation_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: negative_continuation_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string splits: - name: train num_bytes: 228058502 num_examples: 200 download_size: 228058502 dataset_size: 228058502 - config_name: rir_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: negative_sample_tokenwise_loss sequence: float32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_tokenwise_loss sequence: float32 - name: prompt_sample_tokenwise_loss sequence: float32 - name: model_generated_continuation dtype: audio: sampling_rate: 16000 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: model_sampling_rate dtype: int64 - name: ppl_sanity dtype: int64 - name: positive_sample_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: negative_sample_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: positive_continuation_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: negative_continuation_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string splits: - name: train num_bytes: 202444443 num_examples: 200 download_size: 202444443 dataset_size: 202444443 - config_name: sentiment_alignment features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: negative_sample_tokenwise_loss sequence: float32 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: model_sampling_rate dtype: int64 - name: ppl_sanity dtype: int64 - name: positive_sample_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: negative_sample_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: model_generated_continuation dtype: audio: sampling_rate: 16000 splits: - name: train num_bytes: 46555074 num_examples: 200 download_size: 46555074 dataset_size: 46555074 - config_name: sentiment_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: negative_sample_tokenwise_loss sequence: float32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_tokenwise_loss sequence: float32 - name: prompt_sample_tokenwise_loss sequence: float32 - name: model_generated_continuation dtype: audio: sampling_rate: 16000 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: model_sampling_rate dtype: int64 - name: ppl_sanity dtype: int64 - name: positive_sample_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: negative_sample_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: positive_continuation_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: negative_continuation_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string splits: - name: train num_bytes: 223684769 num_examples: 200 download_size: 223684769 dataset_size: 223684769 - config_name: speaker_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: negative_sample_tokenwise_loss sequence: float32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_tokenwise_loss sequence: float32 - name: prompt_sample_tokenwise_loss sequence: float32 - name: model_generated_continuation dtype: audio: sampling_rate: 16000 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: model_sampling_rate dtype: int64 - name: ppl_sanity dtype: int64 - name: positive_sample_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: negative_sample_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: positive_continuation_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string - name: negative_continuation_raw_units list: - name: hubert dtype: string - name: pitch dtype: string - name: style dtype: string splits: - name: train num_bytes: 228183961 num_examples: 200 download_size: 228183961 dataset_size: 228183961 --- # SALMon Normalized Dataset This repo preserves the SALMon per-config folder layout while normalizing mismatched schema details across model families.

提供机构：

SpeechPPL

搜集汇总

数据集介绍

构建方式

在情感计算与人机交互领域，语音情感数据集是构建智能语音系统的基石。SALMon_Spirit-LM-Expressive-normalized2数据集源自对Spirit-LM模型的语音生成结果进行精细标注与后处理。构建过程中，研究者首先利用Spirit-LM生成大量带有情感色彩的语音样本，随后通过专业标注团队对每个样本的情感类别（如快乐、悲伤、愤怒等）和表达强度进行人工标注，并对音频信号进行归一化处理，包括幅度归一化和时长对齐，以消除不同样本之间的声学差异，确保数据的一致性和可比性。

特点

该数据集的显著特点在于其情感表达的丰富性与精细度。一方面，它覆盖了多种基本情感类别，且每种情感均包含从弱到强的不同强度等级，使得模型能够学习到情感表达的连续变化。另一方面，经过归一化处理的音频数据在声学特征上具有高度一致性，有效减少了无关噪声和录制条件差异带来的干扰，为情感识别模型的训练提供了纯净且高质量的输入。此外，标注过程中采用的严格质量控制机制保证了标签的准确性和可靠性。

使用方法

在使用该数据集时，研究者可以将其直接用于训练情感分类或情感强度回归模型。数据集以标准的音频格式（如WAV）存储，并附有对应的标签文件（JSON或CSV格式），便于加载。推荐采用5折交叉验证来评估模型性能，将数据按情感类别分层划分为训练集和测试集，以避免类别不平衡问题。对于深度学习框架，可使用torchaudio或librosa库进行音频预处理，如提取梅尔频谱特征，然后输入到预训练模型（如HuBERT或Wav2Vec2）进行微调，以实现高效的情感识别任务。

背景与挑战

背景概述

SALMon_Spirit-LM-Expressive-normalized2数据集诞生于语音与语言交叉领域的前沿探索，由致力于情感计算与表达性语音合成的研究团队构建。该数据集围绕如何精准量化与建模语音中的情感表达这一核心问题展开，旨在弥合自然语音与合成语音在情感细腻度上的鸿沟。通过整合Spirit-LM模型的先进表征能力与SALMon框架的评估优势，该数据集为研究者提供了标准化、归一化的情感语音样本，推动了表达性语音生成与感知理解领域的基准测试发展。其影响力不仅限于学术研究，更渗透至人机交互、辅助技术及虚拟助手等应用场景，为打造更具情感共鸣的语音系统奠定了数据基础。

当前挑战

该数据集面临的核心挑战在于情感表达的复杂多变性与标注一致性的矛盾。具体而言，语音中的情感维度（如愉悦度、唤醒度、支配度）难以通过单一标签精确刻画，且不同文化背景与个体差异导致情感感知存在歧义，这对数据集的标注规范性提出了严峻考验。在构建过程中，如何从大规模语音数据中分离出纯净的情感特征并消除噪声干扰，同时避免因归一化处理而损失表达多样性，成为技术难点。此外，确保Spirit-LM模型生成的表达性表征与人类听觉感知高度对齐，并跨语言、跨场景保持鲁棒性，进一步加剧了数据集的构建挑战。

常用场景

经典使用场景

SALMon_Spirit-LM-Expressive-normalized2数据集汇聚了丰富的情感语音与语言模型交互数据，成为研究语音情感表达与自然语言理解交叉领域的基石。其经典使用场景聚焦于语音情感识别与生成任务，尤其是在多模态情感分析中，研究者可借助该数据集训练模型，精准捕捉语音中蕴含的喜悦、悲伤、愤怒等细腻情感特征，并实现从文本到语音的情感映射，为构建更具人性化的语音交互系统奠定数据基础。

衍生相关工作

基于该数据集，衍生出一系列里程碑式的研究工作，包括情感可控的语音生成模型（如情感韵律嵌入法）、跨语种情感迁移学习的基准测试，以及语音与文本联合情感标注协议的构建。这些工作不仅验证了数据集在零样本情感分类中的泛化潜力，还推动了如Emotional-TTS与Affective-NLG等前沿方向的发展，形成了以数据为中心的情感AI研究生态链。

数据集最近研究