SpeechPPL/SALMon_Flow-SLM-270M-normalized
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/SpeechPPL/SALMon_Flow-SLM-270M-normalized
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: bg_alignment
data_files:
- split: train
path: bg_alignment/train-*
- config_name: bg_all_consistency
data_files:
- split: train
path: bg_all_consistency/train-*
- config_name: bg_domain_consistency
data_files:
- split: train
path: bg_domain_consistency/train-*
- config_name: gender_consistency
data_files:
- split: train
path: gender_consistency/train-*
- config_name: rir_consistency
data_files:
- split: train
path: rir_consistency/train-*
- config_name: sentiment_alignment
data_files:
- split: train
path: sentiment_alignment/train-*
- config_name: sentiment_consistency
data_files:
- split: train
path: sentiment_consistency/train-*
- config_name: speaker_consistency
data_files:
- split: train
path: speaker_consistency/train-*
dataset_info:
- config_name: bg_alignment
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: negative_sample_tokenwise_loss
sequence: float32
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation1
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation2
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation3
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation4
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation5
dtype:
audio:
sampling_rate: 16000
splits:
- name: train
num_bytes: 86622566
num_examples: 200
download_size: 86622566
dataset_size: 86622566
- config_name: bg_all_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: negative_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 16000
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: positive_continuation_tokenwise_loss
sequence: float64
- name: negative_continuation_tokenwise_loss
sequence: float64
- name: model_generated_continuation1
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation2
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation3
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation4
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation5
dtype:
audio:
sampling_rate: 16000
splits:
- name: train
num_bytes: 1304740834
num_examples: 200
download_size: 1304740834
dataset_size: 1304740834
- config_name: bg_domain_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: negative_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 16000
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: positive_continuation_tokenwise_loss
sequence: float64
- name: negative_continuation_tokenwise_loss
sequence: float64
- name: model_generated_continuation1
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation2
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation3
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation4
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation5
dtype:
audio:
sampling_rate: 16000
splits:
- name: train
num_bytes: 1309815780
num_examples: 200
download_size: 1309815780
dataset_size: 1309815780
- config_name: gender_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: negative_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 16000
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: positive_continuation_tokenwise_loss
sequence: float64
- name: negative_continuation_tokenwise_loss
sequence: float64
- name: model_generated_continuation1
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation2
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation3
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation4
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation5
dtype:
audio:
sampling_rate: 16000
splits:
- name: train
num_bytes: 1305148513
num_examples: 200
download_size: 1305148513
dataset_size: 1305148513
- config_name: rir_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: negative_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 16000
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: positive_continuation_tokenwise_loss
sequence: float64
- name: negative_continuation_tokenwise_loss
sequence: float64
- name: model_generated_continuation1
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation2
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation3
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation4
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation5
dtype:
audio:
sampling_rate: 16000
splits:
- name: train
num_bytes: 1301731771
num_examples: 200
download_size: 1301731771
dataset_size: 1301731771
- config_name: sentiment_alignment
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: negative_sample_tokenwise_loss
sequence: float32
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation1
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation2
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation3
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation4
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation5
dtype:
audio:
sampling_rate: 16000
splits:
- name: train
num_bytes: 46462299
num_examples: 200
download_size: 46462299
dataset_size: 46462299
- config_name: sentiment_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: negative_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 16000
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: positive_continuation_tokenwise_loss
sequence: float64
- name: negative_continuation_tokenwise_loss
sequence: float64
- name: model_generated_continuation1
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation2
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation3
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation4
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation5
dtype:
audio:
sampling_rate: 16000
splits:
- name: train
num_bytes: 1309803538
num_examples: 200
download_size: 1309803538
dataset_size: 1309803538
- config_name: speaker_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: negative_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 16000
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: positive_continuation_tokenwise_loss
sequence: float64
- name: negative_continuation_tokenwise_loss
sequence: float64
- name: model_generated_continuation1
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation2
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation3
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation4
dtype:
audio:
sampling_rate: 16000
- name: model_generated_continuation5
dtype:
audio:
sampling_rate: 16000
splits:
- name: train
num_bytes: 1303032690
num_examples: 200
download_size: 1303032690
dataset_size: 1303032690
---
# SALMon Normalized Dataset
This repo preserves the SALMon per-config folder layout while normalizing
mismatched schema details across model families.
提供机构:
SpeechPPL
搜集汇总
数据集介绍

构建方式
SALMon_Flow-SLM-270M-normalized数据集源于对原始SALMon数据集的精细化重整,核心目标在于消除不同模型家族间因架构差异导致的schema不一致问题。构建过程严格保留了原SALMon数据集按任务划分的多配置文件夹布局,涵盖bg_alignment、gender_consistency、sentiment_consistency等八个独立子集。每个子集均包含成对的正负样本音频、提示音频及模型生成的多个延续音频片段,并附有逐令牌损失等量化评估指标,为音频生成质量的对比分析提供了完备的对照基础。
特点
该数据集最显著的特征在于其多维度、多样化的评估体系。它不仅覆盖了背景对齐、领域一致性、性别一致性、房间冲激响应一致性、情感对齐、情感一致性、说话人一致性等多种声学与语义任务,还在每个子集中提供了丰富的元信息,如音频采样率、编码帧率与深度、模型采样率以及困惑度等控制参量。这种精细化的结构设计,使得研究者能够从不同侧面对语音生成模型的性能进行系统而深入的诊断。
使用方法
使用方法上,用户可直接通过HuggingFace Datasets库加载数据集,并根据具体任务选择对应的config名称,例如使用'gender_consistency'评估模型在性别维度上的表现。每个子集均包含训练分割,内含约200个样本,适合作为验证或测试基准。开发者可调用其中的正负音频样本及延续片段,结合预计算的令牌级损失值,对语音语言模型进行稳健性检验或对比实验,从而有效评估模型在特定属性上的可控性与一致性。
背景与挑战
背景概述
随着大型语言模型在多模态生成领域的迅猛发展,音频生成模型(如SoundStorm及类似架构)对细粒度控制与语义一致性的需求日益迫切。在此背景下,SALMon_Flow-SLM-270M-normalized数据集于近期由相关研究机构构建并发布,旨在系统性地评估和提升音频生成模型在多个维度上的对齐能力。该数据集围绕背景对齐、背景一致性、领域一致性、性别一致性、脉冲响应一致性、情感对齐、情感一致性以及说话人一致性等八大任务精心设计,每个任务均包含200个样本,并提供了正负样本对、模型生成的延续音频以及逐词损失等详尽元数据。其核心研究问题聚焦于如何量化和校准生成音频在语义、声学属性及上下文连贯性上的保真度,从而为构建更加可控且可靠的音频生成系统奠定基础。该数据集的推出,为音频生成领域提供了一个标准化的评估基准,有望推动相关模型从单纯的“生成”向“可控生成”与“语义对齐”的方向演进。
当前挑战
该数据集所应对的领域挑战在于,现有音频生成模型普遍缺乏对生成结果在细粒度属性上的精确控制与一致性保障。例如,模型可能改变背景声、说话人身份、情感色彩或声学环境(如混响),导致生成内容与用户意图或前置语境产生偏离。为此,SALMon_Flow-SLM-270M-normalized通过设计八类对比任务,系统性地暴露并量化了模型在背景、领域、性别、情感、说话人以及脉冲响应等维度上的不稳定性。在数据集构建过程中,另一重挑战在于跨模型家族的元数据标准化。由于不同模型家族(如AudioLM、SoundStorm等)在采样率、音频表示、损失计算方式上存在显著异质性,构建者必须对正负样本的损失值、音频编码深度与帧率等字段进行归一化处理,以确保评估指标的跨模型可比性。此外,每个任务仅含200个样本的规模,虽利于聚焦评测,却也增加了数据选取的代表性与平衡性要求,需精心设计正负样本对的构建策略以避免偏向性。
常用场景
经典使用场景
在语音合成与生成领域,SALMon_Flow-SLM-270M-normalized数据集的核心用途在于评估和提升基于语言模型的语音续写能力。该数据集精心设计了多个维度的对比样本,涵盖背景噪声对齐、情感一致性、说话人一致性、混响一致性及领域一致性等关键声学属性。通过提供正负配对音频及其对应的模型逐令牌损失,研究人员能够系统性地考察预训练语音语言模型在延续给定音频时,是否能维持原有声学特征与韵律模态的连贯性,从而成为音频质量评估和可控语音生成研究的理想基准。
衍生相关工作
SALMon_Flow-SLM-270M-normalized的规范化设计启发了后续一系列关于语音语言模型可解释性与可控性的研究。相关工作包括基于逐令牌损失分析来定位模型对声学属性敏感性的特征归因方法,以及利用该数据集中的对比对训练属性感知的奖励模型以优化强化学习对齐策略。此外,该数据集的目录结构也成为其他多属性音频评测基准(如AudioBench系列)在组织多任务样本时的参考模板,促进了语音生成领域评测体系的标准化进程。
数据集最近研究
最新研究方向
在语音生成与感知领域,SALMon_Flow-SLM-270M-normalized数据集的出现标志着对语音语言模型细粒度评估与对齐控制的前沿探索。该数据集通过引入背景、领域、性别、混响、情感及说话者等多维度一致性任务与对齐任务,系统性地评估和提升语音语言模型在生成过程中的感知忠实度与上下文延续性。其核心创新在于利用token级损失函数对比正负样本,量化模型对声学与语义特征的保持能力,并借助多样化配置下的模型生成音频进行交叉验证。这一研究方向呼应了当前对可信与可控语音生成系统的迫切需求,为构建鲁棒且富有表现力的语音交互系统奠定了关键的数据基石。
以上内容由遇见数据集搜集并总结生成



