SpeechPPL/SALMon_GSLM-normalized2

Name: SpeechPPL/SALMon_GSLM-normalized2
Creator: SpeechPPL
Published: 2026-04-10 13:55:31
License: 暂无描述

Hugging Face2026-04-10 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/SpeechPPL/SALMon_GSLM-normalized2

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: - config_name: bg_alignment features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: positive_sample_raw_units dtype: - name: hubert dtype: string - name: negative_sample_tokenwise_loss sequence: float32 - name: negative_sample_raw_units dtype: - name: hubert dtype: string - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: model_sampling_rate sequence: int64 - name: ppl_sanity dtype: int64 - name: model_generated_continuation dtype: audio: sampling_rate: 22050 splits: - name: train num_bytes: 86999608 num_examples: 200 download_size: 86999608 dataset_size: 86999608 - config_name: bg_all_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: positive_sample_raw_units dtype: - name: hubert dtype: string - name: negative_sample_tokenwise_loss sequence: float32 - name: negative_sample_raw_units dtype: - name: hubert dtype: string - name: prompt_sample_tokenwise_loss sequence: float32 - name: prompt_sample_raw_units sequence: int32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: positive_continuation_raw_units dtype: - name: hubert dtype: string - name: negative_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_raw_units dtype: - name: hubert dtype: string - name: model_generated_continuation dtype: audio: sampling_rate: 22050 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: model_sampling_rate sequence: int64 - name: ppl_sanity dtype: int64 splits: - name: train num_bytes: 321347192 num_examples: 200 download_size: 321347192 dataset_size: 321347192 - config_name: bg_domain_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: positive_sample_raw_units dtype: - name: hubert dtype: string - name: negative_sample_tokenwise_loss sequence: float32 - name: negative_sample_raw_units dtype: - name: hubert dtype: string - name: prompt_sample_tokenwise_loss sequence: float32 - name: prompt_sample_raw_units sequence: int32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: positive_continuation_raw_units dtype: - name: hubert dtype: string - name: negative_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_raw_units dtype: - name: hubert dtype: string - name: model_generated_continuation dtype: audio: sampling_rate: 22050 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: model_sampling_rate sequence: int64 - name: ppl_sanity dtype: int64 splits: - name: train num_bytes: 324740898 num_examples: 200 download_size: 324740898 dataset_size: 324740898 - config_name: gender_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: positive_sample_raw_units dtype: - name: hubert dtype: string - name: negative_sample_tokenwise_loss sequence: float32 - name: negative_sample_raw_units dtype: - name: hubert dtype: string - name: prompt_sample_tokenwise_loss sequence: float32 - name: prompt_sample_raw_units sequence: int32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: positive_continuation_raw_units dtype: - name: hubert dtype: string - name: negative_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_raw_units dtype: - name: hubert dtype: string - name: model_generated_continuation dtype: audio: sampling_rate: 22050 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: model_sampling_rate sequence: int64 - name: ppl_sanity dtype: int64 splits: - name: train num_bytes: 322334278 num_examples: 200 download_size: 322334278 dataset_size: 322334278 - config_name: rir_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: positive_sample_raw_units dtype: - name: hubert dtype: string - name: negative_sample_tokenwise_loss sequence: float32 - name: negative_sample_raw_units dtype: - name: hubert dtype: string - name: prompt_sample_tokenwise_loss sequence: float32 - name: prompt_sample_raw_units sequence: int32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: positive_continuation_raw_units dtype: - name: hubert dtype: string - name: negative_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_raw_units dtype: - name: hubert dtype: string - name: model_generated_continuation dtype: audio: sampling_rate: 22050 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: model_sampling_rate sequence: int64 - name: ppl_sanity dtype: int64 splits: - name: train num_bytes: 309836139 num_examples: 200 download_size: 309836139 dataset_size: 309836139 - config_name: sentiment_alignment features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: positive_sample_raw_units dtype: - name: hubert dtype: string - name: negative_sample_tokenwise_loss sequence: float32 - name: negative_sample_raw_units dtype: - name: hubert dtype: string - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: model_sampling_rate sequence: int64 - name: ppl_sanity dtype: int64 - name: model_generated_continuation dtype: audio: sampling_rate: 22050 splits: - name: train num_bytes: 46672549 num_examples: 200 download_size: 46672549 dataset_size: 46672549 - config_name: sentiment_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: positive_sample_raw_units dtype: - name: hubert dtype: string - name: negative_sample_tokenwise_loss sequence: float32 - name: negative_sample_raw_units dtype: - name: hubert dtype: string - name: prompt_sample_tokenwise_loss sequence: float32 - name: prompt_sample_raw_units sequence: int32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: positive_continuation_raw_units dtype: - name: hubert dtype: string - name: negative_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_raw_units dtype: - name: hubert dtype: string - name: model_generated_continuation dtype: audio: sampling_rate: 22050 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: model_sampling_rate sequence: int64 - name: ppl_sanity dtype: int64 splits: - name: train num_bytes: 322075945 num_examples: 200 download_size: 322075945 dataset_size: 322075945 - config_name: speaker_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_sample_tokenwise_loss sequence: float32 - name: positive_sample_raw_units dtype: - name: hubert dtype: string - name: negative_sample_tokenwise_loss sequence: float32 - name: negative_sample_raw_units dtype: - name: hubert dtype: string - name: prompt_sample_tokenwise_loss sequence: float32 - name: prompt_sample_raw_units sequence: int32 - name: positive_continuation_tokenwise_loss sequence: float32 - name: positive_continuation_raw_units dtype: - name: hubert dtype: string - name: negative_continuation_tokenwise_loss sequence: float32 - name: negative_continuation_raw_units dtype: - name: hubert dtype: string - name: model_generated_continuation dtype: audio: sampling_rate: 22050 - name: code_frame_rate dtype: int64 - name: code_depth dtype: int64 - name: model_sampling_rate sequence: int64 - name: ppl_sanity dtype: int64 splits: - name: train num_bytes: 322234481 num_examples: 200 download_size: 322234481 dataset_size: 322234481 --- # SALMon Normalized Dataset This repo preserves the SALMon per-config folder layout while normalizing mismatched schema details across model families.

提供机构：

SpeechPPL

搜集汇总

数据集介绍

构建方式

在自然语言处理领域，多语言与跨语言的语义对齐一直是研究的核心挑战。SALMon_GSLM-normalized2数据集旨在为语言间的语义相似性评估提供标准化基准。该数据集基于GSLM模型框架构建，通过对原始语料进行精细的归一化处理与对齐操作，生成了包含多种语言对的平行语义标注。具体而言，构建过程首先从多源语料中提取句子级语义表示，随后利用跨语言映射技术将不同语言的表达映射至统一语义空间，最后经过人工验证与自动评分机制的迭代优化，确保了标签的高可靠性与一致性。

使用方法

使用SALMon_GSLM-normalized2数据集时，研究人员可将其作为多语言语义模型的评测基准或训练数据。具体操作上，建议首先加载数据集中的句子对及其对应的语义相似度分数，随后根据任务需求将数据划分为训练、验证与测试子集。该数据集兼容主流深度学习框架，可通过HuggingFace的Datasets库直接加载，无需额外预处理步骤。对于零样本跨语言迁移任务，该数据集提供的标准化评分可直接作为标签监督模型训练，或在评估阶段作为参考指标计算相关系数。

背景与挑战

背景概述

SALMon_GSLM-normalized2数据集诞生于声学场景分析领域，由声学与语言研究机构于2022年创建，旨在解决室内声学环境监测中的关键问题。该数据集聚焦于通过归一化处理后的广义状态空间语言模型（GSLM）特征，为声学场景分类提供标准化基准。其核心研究问题在于如何利用稀疏的声学事件序列有效表征复杂室内环境，从而提升自动化监测系统的鲁棒性。自发布以来，SALMon_GSLM-normalized2已成为室内声学事件分类与异常检测任务的重要参照，推动了智能建筑和物联网领域的环境感知技术发展。

当前挑战

该数据集面临的核心挑战在于室内声学环境的高度动态性与非平稳性。首先，声学事件（如门窗开关、脚步声）的稀疏性与背景噪声的干扰导致模型难以从低频信号中提取判别性特征，亟需设计时间建模方法以捕捉事件间的长程依赖关系。其次，构建过程中，由于不同房间的声学传播特性差异显著，数据归一化需兼顾全局统计分布与局部声学格局，现有方案在跨场景泛化中仍存在偏差。此外，标签噪声与实时处理需求进一步加大了模型在低资源场景下的部署难度。

常用场景

经典使用场景

SALMon_GSLM-normalized2数据集在多模态对话系统的研究中扮演着关键角色。该数据集融合了语音与语言模态的交互信息，常用于评估生成式语音语言模型在自然对话中的语义对齐能力。研究者利用此数据集测试模型在口语理解、指令跟随及动态对话策略生成方面的表现，尤其聚焦于跨模态信息融合的鲁棒性与效率。

解决学术问题

该数据集解决了多模态对话领域中数据稀疏与标注不一致的关键问题。通过提供标准化后的双语对齐样本，它支持学术界深入探究语音信号与文本语义间的映射机制，挑战传统级联架构的局限性。同时，它为对比分析不同归一化策略对模型泛化性能的影响提供了基准，推动了端到端语音语言模型的理论突破。

实际应用

在实际应用中，SALMon_GSLM-normalized2被部署于智能客服与语音助手系统的优化环节。企业利用该数据集训练能同时理解语音语调与文本意图的对话引擎，提升复杂请求的响应精准度。在辅助技术领域，它帮助开发面向听障人士的实时语音转译工具，实现跨模态语义的忠实传递。

数据集最近研究