five

SpeechPPL/SALMon_TWIST-7B-normalized

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/SpeechPPL/SALMon_TWIST-7B-normalized
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: bg_alignment data_files: - split: train path: bg_alignment/train-* - config_name: bg_all_consistency data_files: - split: train path: bg_all_consistency/train-* - config_name: bg_domain_consistency data_files: - split: train path: bg_domain_consistency/train-* - config_name: gender_consistency data_files: - split: train path: gender_consistency/train-* - config_name: rir_consistency data_files: - split: train path: rir_consistency/train-* - config_name: sentiment_alignment data_files: - split: train path: sentiment_alignment/train-* - config_name: sentiment_consistency data_files: - split: train path: sentiment_consistency/train-* - config_name: speaker_consistency data_files: - split: train path: speaker_consistency/train-* dataset_info: - config_name: bg_alignment features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: positive_sample_raw_units dtype: - name: hubert dtype: string - name: negative_sample_tokenwise_loss sequence: float32 - name: negative_sample_raw_units dtype: - name: hubert dtype: string - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: offset sequence: int64 - name: model_sampling_rate sequence: int64 - name: ppl_sanity dtype: int64 - name: model_generated_continuation dtype: audio: sampling_rate: 16000 splits: - name: train num_bytes: 86798710 num_examples: 200 download_size: 86798710 dataset_size: 86798710 - config_name: bg_all_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: positive_sample_raw_units dtype: - name: hubert dtype: string - name: negative_sample_tokenwise_loss sequence: float32 - name: negative_sample_raw_units dtype: - name: hubert dtype: string - name: prompt_sample_tokenwise_loss sequence: float32 - name: prompt_sample_raw_units sequence: int32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: positive_continuation_raw_units dtype: - name: hubert dtype: string - name: negative_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_raw_units dtype: - name: hubert dtype: string - name: model_generated_continuation dtype: audio: sampling_rate: 16000 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: offset sequence: int64 - name: model_sampling_rate sequence: int64 - name: ppl_sanity dtype: int64 splits: - name: train num_bytes: 233220532 num_examples: 200 download_size: 233220532 dataset_size: 233220532 - config_name: bg_domain_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: positive_sample_raw_units dtype: - name: hubert dtype: string - name: negative_sample_tokenwise_loss sequence: float32 - name: negative_sample_raw_units dtype: - name: hubert dtype: string - name: prompt_sample_tokenwise_loss sequence: float32 - name: prompt_sample_raw_units sequence: int32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: positive_continuation_raw_units dtype: - name: hubert dtype: string - name: negative_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_raw_units dtype: - name: hubert dtype: string - name: model_generated_continuation dtype: audio: sampling_rate: 16000 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: offset sequence: int64 - name: model_sampling_rate sequence: int64 - name: ppl_sanity dtype: int64 splits: - name: train num_bytes: 236286683 num_examples: 200 download_size: 236286683 dataset_size: 236286683 - config_name: gender_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: positive_sample_raw_units dtype: - name: hubert dtype: string - name: negative_sample_tokenwise_loss sequence: float32 - name: negative_sample_raw_units dtype: - name: hubert dtype: string - name: prompt_sample_tokenwise_loss sequence: float32 - name: prompt_sample_raw_units sequence: int32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: positive_continuation_raw_units dtype: - name: hubert dtype: string - name: negative_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_raw_units dtype: - name: hubert dtype: string - name: model_generated_continuation dtype: audio: sampling_rate: 16000 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: offset sequence: int64 - name: model_sampling_rate sequence: int64 - name: ppl_sanity dtype: int64 splits: - name: train num_bytes: 234595453 num_examples: 200 download_size: 234595453 dataset_size: 234595453 - config_name: rir_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: positive_sample_raw_units dtype: - name: hubert dtype: string - name: negative_sample_tokenwise_loss sequence: float32 - name: negative_sample_raw_units dtype: - name: hubert dtype: string - name: prompt_sample_tokenwise_loss sequence: float32 - name: prompt_sample_raw_units sequence: int32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: positive_continuation_raw_units dtype: - name: hubert dtype: string - name: negative_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_raw_units dtype: - name: hubert dtype: string - name: model_generated_continuation dtype: audio: sampling_rate: 16000 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: offset sequence: int64 - name: model_sampling_rate sequence: int64 - name: ppl_sanity dtype: int64 splits: - name: train num_bytes: 218520027 num_examples: 200 download_size: 218520027 dataset_size: 218520027 - config_name: sentiment_alignment features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: positive_sample_raw_units dtype: - name: hubert dtype: string - name: negative_sample_tokenwise_loss sequence: float32 - name: negative_sample_raw_units dtype: - name: hubert dtype: string - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: offset sequence: int64 - name: model_sampling_rate sequence: int64 - name: ppl_sanity dtype: int64 - name: model_generated_continuation dtype: audio: sampling_rate: 16000 splits: - name: train num_bytes: 46603539 num_examples: 200 download_size: 46603539 dataset_size: 46603539 - config_name: sentiment_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: positive_sample_raw_units dtype: - name: hubert dtype: string - name: negative_sample_tokenwise_loss sequence: float32 - name: negative_sample_raw_units dtype: - name: hubert dtype: string - name: prompt_sample_tokenwise_loss sequence: float32 - name: prompt_sample_raw_units sequence: int32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: positive_continuation_raw_units dtype: - name: hubert dtype: string - name: negative_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_raw_units dtype: - name: hubert dtype: string - name: model_generated_continuation dtype: audio: sampling_rate: 16000 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: offset sequence: int64 - name: model_sampling_rate sequence: int64 - name: ppl_sanity dtype: int64 splits: - name: train num_bytes: 231474012 num_examples: 200 download_size: 231474012 dataset_size: 231474012 - config_name: speaker_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: positive_sample_raw_units dtype: - name: hubert dtype: string - name: negative_sample_tokenwise_loss sequence: float32 - name: negative_sample_raw_units dtype: - name: hubert dtype: string - name: prompt_sample_tokenwise_loss sequence: float32 - name: prompt_sample_raw_units sequence: int32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: positive_continuation_raw_units dtype: - name: hubert dtype: string - name: negative_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_raw_units dtype: - name: hubert dtype: string - name: model_generated_continuation dtype: audio: sampling_rate: 16000 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: offset sequence: int64 - name: model_sampling_rate sequence: int64 - name: ppl_sanity dtype: int64 splits: - name: train num_bytes: 235250976 num_examples: 200 download_size: 235250976 dataset_size: 235250976 --- # SALMon Normalized Dataset This repo preserves the SALMon per-config folder layout while normalizing mismatched schema details across model families.
提供机构:
SpeechPPL
搜集汇总
数据集介绍
main_image_url
构建方式
SALMon_TWIST-7B-normalized数据集是在SALMon原始数据集基础上,经过精细的架构归一化处理构建而成。该数据集保留了原有的多配置文件夹布局,涵盖了诸如背景对齐、背景一致性、领域一致性、性别一致性、混响一致性、情感对齐、情感一致性及说话人一致性等八个子任务配置。每个配置均包含200条高质量训练样本,每条样本以16kHz采样率的音频为主,辅以正负样本的token损失向量、HuBERT单元编码等结构化特征,确保了跨模型家族间字段架构的统一与兼容。
特点
该数据集的核心特点在于其多维度的音频一致性评估能力。通过精心设计的正负对照样本对,以及模型生成的延续音频,该数据集能够从背景、领域、性别、混响、情感和说话人等多个层面,系统性地衡量音频生成模型在特定属性上的保持与对齐表现。每个样本均附带详尽的token级损失与原始单元信息,为分析模型在细粒度上的行为提供了丰富数据支撑,使得该数据集成为语音理解与生成领域评估的利器。
使用方法
使用者可通过HuggingFace Datasets库便捷地加载该数据集,依据任务需求选择对应的配置名称(如'gender_consistency')进行调用。数据集结构清晰,每一条记录均包含任务标识、索引、正负音频样本、提示音频及其延续片段。尤为重要的是,诸多音频字段内置了16kHz的采样率规范,且所有样本均存储在train分割中,便于直接用于模型的微调、评估或作为对比学习的训练材料。
背景与挑战
背景概述
SALMon_TWIST-7B-normalized数据集是在语音生成模型评估领域的一项关键资源,其创建旨在系统性地考察神经音频编解码模型在多维度声学属性上的保持能力。该数据集由相关研究机构于近期发布,核心研究聚焦于量化生成语音在背景噪声对齐、背景一致性、领域一致性、性别一致性、房间脉冲响应一致性、情感对齐与一致性以及说话人一致性等精细维度的表现。通过提供精心设计的正负样本对以及基于HuBERT单元的逐Token损失等元数据,该数据集为评估与对比不同模型在音质保真度与声学属性控制上的优劣提供了标准化基准,对推动高质量、高可控性语音合成技术的发展具有重要影响力。
当前挑战
该数据集所应对的核心挑战在于,当前语音生成模型虽能合成流畅的音频,却常在保持特定声学属性方面出现失误,例如背景噪声的突变、说话人性别的不一致或情感基调的偏离,这严重制约了其在人机交互、虚拟主播等应用场景中的可靠性。在构建过程中,面临的挑战包括:1)如何从已有生成模型中提取并精确标注代表不同声学属性一致性的正负样本对;2)如何对来自不同模型家族的多样化输出进行模式规范化,以消除因模型架构差异导致的元数据格式不统一;3)如何确保每类属性仅有200个样本的评估集仍能提供足够的统计效力以区分模型间的细微性能差别。
常用场景
经典使用场景
在神经音频编解码与生成式语音建模的前沿领域,SALMon_TWIST-7B-normalized数据集为评估和提升音频语言模型的细粒度可控性提供了标准化基准。其经典使用场景聚焦于衡量模型在背景噪声、说话人身份、情感韵律、房间脉冲响应及领域风格等多维声学属性上的保持能力与一致性。通过对比正负样本对在连续音频生成任务中的逐词损失与原始声学单元编码,研究者能够系统性地剖析模型在给定提示后,是否忠实地延续了目标声学特征,从而为构建更可靠、更具表现力的语音合成与编辑系统奠定数据基础。
解决学术问题
该数据集精准回应了当前音频生成模型中普遍存在的属性漂移与控制失效等核心学术难题。过往工作往往孤立地评估单一维度,而SALMon_TWIST-7B-normalized通过统一的结构化配置(如alignment与consistency)与细粒度的tokenwise损失信息,使得在相同框架下联合考察情感一致性、说话人鲁棒性、背景对齐度等成为可能。它解决了缺乏跨模型家族、可复现的控制性评测基准这一关键瓶颈,推动了学界对音频语言模型内在表征解耦能力与泛化边界的深层认知,其影响力已渗透至脑机接口语音重建与多模态对话系统等前瞻领域。
衍生相关工作
基于SALMon_TWIST-7B-normalized的标准化框架,学术界已衍生出一系列代表性成果。研究者利用其alignment系列配置,发展了针对音频语言模型的对比学习策略,如偏好对齐微调(RLHF)在语音领域的变体;consistency系列则启发了多任务提示调优方法,促使模型在单一推理中同时维持多种声学属性。更有工作借助其逐词损失与声学单元编码,提出了Token级别的属性插值技术,实现了对话流中情感的渐变平滑与背景环境的无缝迁移,这些经典工作共同构筑了可控语音生成领域的坚实理论基石与技术生态。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作