SpeechPPL/SALMon_Flow-SLM-1B-Extended-normalized
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/SpeechPPL/SALMon_Flow-SLM-1B-Extended-normalized
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: bg_alignment
data_files:
- split: train
path: bg_alignment/train-*
- config_name: bg_all_consistency
data_files:
- split: train
path: bg_all_consistency/train-*
- config_name: bg_domain_consistency
data_files:
- split: train
path: bg_domain_consistency/train-*
- config_name: gender_consistency
data_files:
- split: train
path: gender_consistency/train-*
- config_name: rir_consistency
data_files:
- split: train
path: rir_consistency/train-*
- config_name: sentiment_alignment
data_files:
- split: train
path: sentiment_alignment/train-*
- config_name: sentiment_consistency
data_files:
- split: train
path: sentiment_consistency/train-*
- config_name: speaker_consistency
data_files:
- split: train
path: speaker_consistency/train-*
dataset_info:
- config_name: bg_alignment
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: negative_sample_tokenwise_loss
sequence: float32
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation1
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation2
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation3
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation4
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation5
dtype:
audio:
sampling_rate: 16000
splits:
- name: train
num_bytes: 86639819
num_examples: 200
download_size: 86639819
dataset_size: 86639819
- config_name: bg_all_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: negative_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 16000
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: positive_continuation_tokenwise_loss
sequence: float64
- name: negative_continuation_tokenwise_loss
sequence: float64
- name: model_generated_continuation1
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation2
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation3
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation4
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation5
dtype:
audio:
sampling_rate: 16000
splits:
- name: train
num_bytes: 1269408501
num_examples: 200
download_size: 1269408501
dataset_size: 1269408501
- config_name: bg_domain_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: negative_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 16000
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: positive_continuation_tokenwise_loss
sequence: float64
- name: negative_continuation_tokenwise_loss
sequence: float64
- name: model_generated_continuation1
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation2
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation3
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation4
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation5
dtype:
audio:
sampling_rate: 16000
splits:
- name: train
num_bytes: 1270504334
num_examples: 200
download_size: 1270504334
dataset_size: 1270504334
- config_name: gender_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: negative_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 16000
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: positive_continuation_tokenwise_loss
sequence: float64
- name: negative_continuation_tokenwise_loss
sequence: float64
- name: model_generated_continuation1
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation2
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation3
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation4
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation5
dtype:
audio:
sampling_rate: 16000
splits:
- name: train
num_bytes: 1315571898
num_examples: 200
download_size: 1315571898
dataset_size: 1315571898
- config_name: rir_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: negative_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 16000
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: positive_continuation_tokenwise_loss
sequence: float64
- name: negative_continuation_tokenwise_loss
sequence: float64
- name: model_generated_continuation1
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation2
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation3
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation4
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation5
dtype:
audio:
sampling_rate: 16000
splits:
- name: train
num_bytes: 1300026144
num_examples: 200
download_size: 1300026144
dataset_size: 1300026144
- config_name: sentiment_alignment
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: negative_sample_tokenwise_loss
sequence: float32
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation1
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation2
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation3
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation4
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation5
dtype:
audio:
sampling_rate: 16000
splits:
- name: train
num_bytes: 46471520
num_examples: 200
download_size: 46471520
dataset_size: 46471520
- config_name: sentiment_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: negative_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 16000
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: positive_continuation_tokenwise_loss
sequence: float64
- name: negative_continuation_tokenwise_loss
sequence: float64
- name: model_generated_continuation1
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation2
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation3
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation4
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation5
dtype:
audio:
sampling_rate: 16000
splits:
- name: train
num_bytes: 1311269667
num_examples: 200
download_size: 1311269667
dataset_size: 1311269667
- config_name: speaker_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: negative_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 16000
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: positive_continuation_tokenwise_loss
sequence: float64
- name: negative_continuation_tokenwise_loss
sequence: float64
- name: model_generated_continuation1
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation2
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation3
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation4
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation5
dtype:
audio:
sampling_rate: 16000
splits:
- name: train
num_bytes: 1316303506
num_examples: 200
download_size: 1316303506
dataset_size: 1316303506
---
# SALMon Normalized Dataset
This repo preserves the SALMon per-config folder layout while normalizing
mismatched schema details across model families.
提供机构:
SpeechPPL
搜集汇总
数据集介绍

构建方式
SALMon_Flow-SLM-1B-Extended-normalized 数据集是在语音语言模型评估领域的一项系统化构建成果。该数据集以 SALMon 框架为基础,通过保留原有按配置项组织的文件夹布局,并针对不同模型家族之间存在的模式细节不匹配问题进行了归一化处理。具体而言,数据集涵盖了八个配置子集,包括背景对齐、背景一致性与领域一致性等维度,每个子集均包含 200 个训练样本。每个样本结构丰富,不仅提供了正负样本的音频数据及其对应的提示音频与延续音频,还收录了模型生成的多个延续音频片段以及逐 token 的损失值序列,从而为评估模型在听觉一致性等方面的表现提供了精细化的数据支撑。
使用方法
使用 SALMon_Flow-SLM-1B-Extended-normalized 数据集时,用户可通过 HuggingFace Datasets 库按配置名称加载相应子集,例如加载背景对齐子集可使用 `load_dataset('path', 'bg_alignment')` 命令。每个子集均提供训练分割,用户可直接获取包含音频数据与各项损失值的样本。研究者可基于正负样本的逐 token 损失序列进行模型表现分析,亦可利用多组模型生成音频开展主观听测或自动化评估。音频数据在加载后自动以 16kHz 采样率重采样,便于与各类语音处理管线集成。由于数据集已归一化,不同配置间的字段差异已被消除,进一步降低了使用门槛。
背景与挑战
背景概述
在神经音频编解码与生成式语音建模领域,确保模型输出的细粒度可控性与一致性是亟待攻克的核心议题。SALMon_Flow-SLM-1B-Extended-normalized数据集由SALMon团队构建,专注于评估和引导大规模语音语言模型(SLM)在多种声学与语义属性上的对齐能力。该数据集创建于当前生成式语音模型快速演进的阶段,其核心研究问题在于如何系统性地量化和提升模型在背景噪声、说话人、性别、情感、房间冲激响应(RIR)及领域等维度上的生成一致性。数据集涵盖了八个子任务,如背景对齐、情感一致性等,每个子任务均包含成对的正面与负面音频样本,并附有逐标记的损失值,为深入分析模型行为提供了细粒度的监督信号。这一资源对于推动语音生成模型从粗糙的仿制迈向精准的可控表达具有重要价值。
当前挑战
当前数据集所应对的首要领域挑战源于语音语言模型在复杂声学环境下保持多属性一致性的困难,即模型在生成连续语音时,往往会丢失或混淆对话者身份、情感基调或背景声学条件,导致生成内容在感知上不连贯或与提示条件不符。构建层面的挑战则体现在数据标注的精细度与规模的权衡上,每个子任务仅包含200条训练样本,虽然保证了标注的准确性,却对模型的泛化能力构成了严峻考验。此外,跨模型家族(如不同解码器架构)的评分与特征归一化流程的协调,以及确保负面样本提供有意义的对比信息而非简单噪点,亦属于数据准备过程中的技术难点。这些挑战共同指向如何利用有限但高质量的标注数据,有效引导大规模模型实现鲁棒且可控的语音生成。
常用场景
经典使用场景
在语音生成与音频理解领域,SALMon_Flow-SLM-1B-Extended-normalized数据集为评估和优化基于语音语言模型(SLM)的条件音频延续生成任务提供了标准化的评测基准。该数据集精心设计了多个子配置,涵盖背景一致性、领域一致性、性别一致性、混响一致性、情感对齐、情感一致性及说话者一致性等维度,每个配置均包含正负样本对、提示音频及模型生成的多版本延续音频。研究者可借此系统地检验模型在保持特定声学属性或语义特征前提下的生成连贯性与可控性,进而推动语音生成模型在细粒度属性控制方向的发展。
解决学术问题
该数据集紧密围绕语音语言模型在条件生成中面临的属性保持难题,解决了如何定量评估模型在背景声、说话人身份、情感表达、混响环境及领域风格等方面的连贯性这一核心学术问题。通过提供正负样本对比损失和逐token损失等细粒度指标,数据集合力解构模型在生成过程中对关键声学线索的依赖程度与保持能力。其标准化处理跨模型家族的架构差异,使得不同方法间的公平比较成为可能,为揭示模型内部表征与外部声学属性之间的映射关系奠定了坚实的实验基础。
实际应用
在实际应用层面,SALMon_Flow-SLM-1B-Extended-normalized数据集可服务于语音交互系统的品质提升,例如智能助手在延续用户语句时保持一致的说话人音色与情感基调,或虚拟主播在不同音频片段间维持稳定的背景氛围。情感对齐与一致性配置能够助力语音合成技术在客服、教育及娱乐场景中更精准地传递目标情感,说话者一致性则对多说话人语音克隆与个性化语音生成极具价值。此外,该数据集还可用于音频内容创作工具中,确保自动生成的语音延续片段与原始音频在声学属性上的自然过渡。
数据集最近研究
最新研究方向
随着生成式语音模型在零样本语音合成与风格迁移领域的迅猛发展,如何精准评估并控制模型在多维度条件下的生成一致性与可控性成为前沿热点。SALMon_Flow-SLM-1B-Extended-normalized数据集通过精心设计的子任务架构——涵盖背景对齐、领域一致性、性别一致性、混响一致性、情感对齐、情感一致性及说话人一致性,为探索语音大模型在复杂声学环境下的鲁棒生成提供了标准化的评测框架。该数据集的归一化处理策略统一了不同模型家族间的架构差异,使得跨模型的细粒度行为对比与一致性分析成为可能。围绕这一资源,当前研究正深入挖掘语音大模型在保持声学属性、情感表达与说话人身份不变的前提下,进行可控延续生成时的动态损失变化规律,旨在推动从感知质量到语义保真度的全方位一致性格局,为构建高保真、高可控的下一代语音交互系统奠定坚实的数据基础。
以上内容由遇见数据集搜集并总结生成



