SpeechPPL/SALMon_GSLM-normalized
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/SpeechPPL/SALMon_GSLM-normalized
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: bg_alignment
data_files:
- split: train
path: bg_alignment/train-*
- config_name: bg_all_consistency
data_files:
- split: train
path: bg_all_consistency/train-*
- config_name: bg_domain_consistency
data_files:
- split: train
path: bg_domain_consistency/train-*
- config_name: gender_consistency
data_files:
- split: train
path: gender_consistency/train-*
- config_name: rir_consistency
data_files:
- split: train
path: rir_consistency/train-*
- config_name: sentiment_alignment
data_files:
- split: train
path: sentiment_alignment/train-*
- config_name: sentiment_consistency
data_files:
- split: train
path: sentiment_consistency/train-*
- config_name: speaker_consistency
data_files:
- split: train
path: speaker_consistency/train-*
dataset_info:
- config_name: bg_alignment
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: positive_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_sample_tokenwise_loss
sequence: float32
- name: negative_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
sequence: int64
- name: ppl_sanity
dtype: int64
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
splits:
- name: train
num_bytes: 86999608
num_examples: 200
download_size: 86999608
dataset_size: 86999608
- config_name: bg_all_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: positive_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_sample_tokenwise_loss
sequence: float32
- name: negative_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_raw_units
sequence: int32
- name: positive_continuation_tokenwise_loss
sequence: float32
- name: positive_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_continuation_tokenwise_loss
sequence: float32
- name: negative_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
sequence: int64
- name: ppl_sanity
dtype: int64
splits:
- name: train
num_bytes: 321347192
num_examples: 200
download_size: 321347192
dataset_size: 321347192
- config_name: bg_domain_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: positive_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_sample_tokenwise_loss
sequence: float32
- name: negative_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_raw_units
sequence: int32
- name: positive_continuation_tokenwise_loss
sequence: float32
- name: positive_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_continuation_tokenwise_loss
sequence: float32
- name: negative_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
sequence: int64
- name: ppl_sanity
dtype: int64
splits:
- name: train
num_bytes: 324740898
num_examples: 200
download_size: 324740898
dataset_size: 324740898
- config_name: gender_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: positive_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_sample_tokenwise_loss
sequence: float32
- name: negative_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_raw_units
sequence: int32
- name: positive_continuation_tokenwise_loss
sequence: float32
- name: positive_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_continuation_tokenwise_loss
sequence: float32
- name: negative_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
sequence: int64
- name: ppl_sanity
dtype: int64
splits:
- name: train
num_bytes: 322334278
num_examples: 200
download_size: 322334278
dataset_size: 322334278
- config_name: rir_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: positive_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_sample_tokenwise_loss
sequence: float32
- name: negative_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_raw_units
sequence: int32
- name: positive_continuation_tokenwise_loss
sequence: float32
- name: positive_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_continuation_tokenwise_loss
sequence: float32
- name: negative_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
sequence: int64
- name: ppl_sanity
dtype: int64
splits:
- name: train
num_bytes: 309836139
num_examples: 200
download_size: 309836139
dataset_size: 309836139
- config_name: sentiment_alignment
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: positive_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_sample_tokenwise_loss
sequence: float32
- name: negative_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
sequence: int64
- name: ppl_sanity
dtype: int64
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
splits:
- name: train
num_bytes: 46672549
num_examples: 200
download_size: 46672549
dataset_size: 46672549
- config_name: sentiment_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: positive_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_sample_tokenwise_loss
sequence: float32
- name: negative_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_raw_units
sequence: int32
- name: positive_continuation_tokenwise_loss
sequence: float32
- name: positive_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_continuation_tokenwise_loss
sequence: float32
- name: negative_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
sequence: int64
- name: ppl_sanity
dtype: int64
splits:
- name: train
num_bytes: 322075945
num_examples: 200
download_size: 322075945
dataset_size: 322075945
- config_name: speaker_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: positive_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_sample_tokenwise_loss
sequence: float32
- name: negative_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_raw_units
sequence: int32
- name: positive_continuation_tokenwise_loss
sequence: float32
- name: positive_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_continuation_tokenwise_loss
sequence: float32
- name: negative_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
sequence: int64
- name: ppl_sanity
dtype: int64
splits:
- name: train
num_bytes: 322234481
num_examples: 200
download_size: 322234481
dataset_size: 322234481
---
# SALMon Normalized Dataset
This repo preserves the SALMon per-config folder layout while normalizing
mismatched schema details across model families.
提供机构:
SpeechPPL
搜集汇总
数据集介绍

构建方式
SALMon_GSLM-normalized数据集构建于生成式语音语言模型(GSLM)评估框架之下,旨在系统化地衡量模型在多维度语音生成任务中的表现。该数据集通过精心设计的对比样本对构成,每个样本包含正例与负例音频,以此评估模型在背景一致性、域内一致性、性别一致性、音箱一致性、混响一致性、情感对齐及情感一致性等八个关键维度上的能力。构建过程中,研究团队为每个维度生成了200组训练样本,并保留了模型生成的连续音频片段(采样率22050 Hz)以支持后续分析。同时,数据集使用了HuBERT编码单元及逐令牌损失等深层特征进行标注,为细粒度评估提供了丰富语义信息。
特点
SALMon_GSLM-normalized数据集的核心特点在于其归一化处理与多层次结构设计。通过标准化不同模型家族间不兼容的模式架构,该数据集确保了跨模型评估的公平性与可复现性。其次,每个配置对应的样本均包含16 kHz的提示音频、正负连续音频及校验音频,并保留了原始生成音频,形成了从输入到输出的完整评估链路。此外,数据集中嵌入的逐令牌损失序列与原始HuBERT单元,为研究人员提供了从声学特征到损失信号的双重维度,有助于深入剖析模型在特定属性上的失败模式与优势所在。
使用方法
该数据集的使用方式灵活而聚焦,主要面向从事语音生成模型评估与改进的研究人员。通过HuggingFace Datasets库,用户可以按配置名称加载特定评估任务的数据分片,例如使用'bg_alignment'或'speaker_consistency'等键访问相应子集。每个样本中提供了正负例音频及对应的损失序列,研究者可据此计算模型在目标属性上的偏好度或一致性指标。特别地,数据集中的model_generated_continuation字段为分析模型输出的自然度与连贯性提供了直接素材,使得从多个角度对语音生成系统进行严谨的基准测试成为可能。
背景与挑战
背景概述
在语音生成与理解领域,生成式语音语言模型(GSLM)的兴起为无文本语音处理开辟了全新范式。SALMon_GSLM-normalized数据集由相关研究团队创建,旨在系统评估语音生成模型在多种语义与声学属性上的一致性及对齐能力。该数据集通过精心设计的八类任务配置,覆盖了背景噪声、性别、房间冲激响应(RIR)、情感、说话人等多个维度的连续性与对齐性评测,为揭示模型在语音延续中保持全球与局部一致性的内在机理提供了关键基准。其归一化处理确保了不同模型家族间特征兼容,推动了语音语言模型评估标准化的进程,对深入理解生成式语音模型的鲁棒性与语义保持能力具有重要学术价值。
当前挑战
该数据集所解决的领域核心挑战在于,现有语音生成模型往往难以保证在长时间音频延续过程中,背景属性(如噪声场景)、说话人特性或情感状态等非内容特征的一致性。具体挑战包括:模型在生成语音时易出现背景突变、性别混淆或情感漂移等问题,严重制约了其实际应用效果。在构建过程中,数据集面临多重困难,包括为每个一致性任务生成正负样本对时需精确控制变量,确保仅目标属性发生变异而其他特征保持恒定;跨模型家族的特征归一化处理需统一多样化的采样率、编码规则及损失计算方式;同时,每个子配置仅含200个样本的有限规模也对任务难度与质量提出了极高要求。
常用场景
经典使用场景
在语音生成与理解领域,SALMon_GSLM-normalized数据集为评估生成式口语语言模型(GSLM)的细粒度属性控制能力提供了标准化评测平台。该数据集精心设计了八类配置,涵盖背景对齐、背景一致性、领域一致性、性别一致性、房间脉冲响应一致性、情感对齐、情感一致性以及说话人一致性等维度。每个配置包含正负样本对、提示音频及模型生成的延续音频,并附有基于HuBERT单元的逐token损失与原始单元序列,使得研究者能够系统地检测模型在生成过程中对特定声学与语义属性的保持与变换能力。该数据集特别适用于探究自监督语音模型能否在无文本监督的条件下实现属性可控的语音续写,为理解语音离散表征的语义承载能力奠定了数据基础。
实际应用
在实际应用层面,SALMon_GSLM-normalized数据集为语音交互系统的属性控制技术提供了关键验证资源。例如,在智能语音助手场景中,该数据集可用于评估模型能否在保留说话人音色的前提下改变情感表达,或在切换背景环境时维持语义连贯性。对于语音合成与编辑系统,该数据集帮助开发者测试模型在音频续写任务中的环境鲁棒性,确保从安静的室内切换到嘈杂的户外场景时,生成音频依然自然流畅。此外,在语音风格迁移、个性化语音定制以及多模态人机交互等工业界前沿探索中,该数据集提供的属性一致性评测手段为产品化落地过程中的质量控制提供了科学依据。
衍生相关工作
该数据集的出现催生了一系列富有启发性的后续研究工作。首先,基于其标准化评测框架,研究者提出了多种针对语音离散表征的属性解耦方法,旨在提升模型对不同声学属性的独立控制能力。其次,该数据集被用于改进生成式语音模型的训练策略,例如引入对比学习损失以强化正负样本间的表征区分度。再者,依托数据集中丰富的逐token分析信息,衍生出关于HuBERT单元语义空间结构的研究,揭示了离散单元与声学属性之间的映射关系。此外,该数据集的数据组织方式也为构建多维度、细粒度的语音评测基准提供了范式参考,带动了诸如情感可控语音生成、跨说话人风格迁移等任务上的方法论创新。
以上内容由遇见数据集搜集并总结生成



