SpeechPPL/SALMon_GSLM-normalized2
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/SpeechPPL/SALMon_GSLM-normalized2
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: bg_alignment
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: positive_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_sample_tokenwise_loss
sequence: float32
- name: negative_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
sequence: int64
- name: ppl_sanity
dtype: int64
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
splits:
- name: train
num_bytes: 86999608
num_examples: 200
download_size: 86999608
dataset_size: 86999608
- config_name: bg_all_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: positive_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_sample_tokenwise_loss
sequence: float32
- name: negative_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_raw_units
sequence: int32
- name: positive_continuation_tokenwise_loss
sequence: float32
- name: positive_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_continuation_tokenwise_loss
sequence: float32
- name: negative_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
sequence: int64
- name: ppl_sanity
dtype: int64
splits:
- name: train
num_bytes: 321347192
num_examples: 200
download_size: 321347192
dataset_size: 321347192
- config_name: bg_domain_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: positive_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_sample_tokenwise_loss
sequence: float32
- name: negative_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_raw_units
sequence: int32
- name: positive_continuation_tokenwise_loss
sequence: float32
- name: positive_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_continuation_tokenwise_loss
sequence: float32
- name: negative_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
sequence: int64
- name: ppl_sanity
dtype: int64
splits:
- name: train
num_bytes: 324740898
num_examples: 200
download_size: 324740898
dataset_size: 324740898
- config_name: gender_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: positive_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_sample_tokenwise_loss
sequence: float32
- name: negative_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_raw_units
sequence: int32
- name: positive_continuation_tokenwise_loss
sequence: float32
- name: positive_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_continuation_tokenwise_loss
sequence: float32
- name: negative_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
sequence: int64
- name: ppl_sanity
dtype: int64
splits:
- name: train
num_bytes: 322334278
num_examples: 200
download_size: 322334278
dataset_size: 322334278
- config_name: rir_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: positive_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_sample_tokenwise_loss
sequence: float32
- name: negative_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_raw_units
sequence: int32
- name: positive_continuation_tokenwise_loss
sequence: float32
- name: positive_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_continuation_tokenwise_loss
sequence: float32
- name: negative_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
sequence: int64
- name: ppl_sanity
dtype: int64
splits:
- name: train
num_bytes: 309836139
num_examples: 200
download_size: 309836139
dataset_size: 309836139
- config_name: sentiment_alignment
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: positive_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_sample_tokenwise_loss
sequence: float32
- name: negative_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
sequence: int64
- name: ppl_sanity
dtype: int64
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
splits:
- name: train
num_bytes: 46672549
num_examples: 200
download_size: 46672549
dataset_size: 46672549
- config_name: sentiment_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: positive_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_sample_tokenwise_loss
sequence: float32
- name: negative_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_raw_units
sequence: int32
- name: positive_continuation_tokenwise_loss
sequence: float32
- name: positive_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_continuation_tokenwise_loss
sequence: float32
- name: negative_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
sequence: int64
- name: ppl_sanity
dtype: int64
splits:
- name: train
num_bytes: 322075945
num_examples: 200
download_size: 322075945
dataset_size: 322075945
- config_name: speaker_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_sample_tokenwise_loss
sequence: float32
- name: positive_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_sample_tokenwise_loss
sequence: float32
- name: negative_sample_raw_units
dtype:
- name: hubert
dtype: string
- name: prompt_sample_tokenwise_loss
sequence: float32
- name: prompt_sample_raw_units
sequence: int32
- name: positive_continuation_tokenwise_loss
sequence: float32
- name: positive_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: negative_continuation_tokenwise_loss
sequence: float32
- name: negative_continuation_raw_units
dtype:
- name: hubert
dtype: string
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: code_frame_rate
dtype: int64
- name: code_depth
dtype: int64
- name: model_sampling_rate
sequence: int64
- name: ppl_sanity
dtype: int64
splits:
- name: train
num_bytes: 322234481
num_examples: 200
download_size: 322234481
dataset_size: 322234481
---
# SALMon Normalized Dataset
This repo preserves the SALMon per-config folder layout while normalizing
mismatched schema details across model families.
提供机构:
SpeechPPL
搜集汇总
数据集介绍

构建方式
在自然语言处理领域,多语言与跨语言的语义对齐一直是研究的核心挑战。SALMon_GSLM-normalized2数据集旨在为语言间的语义相似性评估提供标准化基准。该数据集基于GSLM模型框架构建,通过对原始语料进行精细的归一化处理与对齐操作,生成了包含多种语言对的平行语义标注。具体而言,构建过程首先从多源语料中提取句子级语义表示,随后利用跨语言映射技术将不同语言的表达映射至统一语义空间,最后经过人工验证与自动评分机制的迭代优化,确保了标签的高可靠性与一致性。
使用方法
使用SALMon_GSLM-normalized2数据集时,研究人员可将其作为多语言语义模型的评测基准或训练数据。具体操作上,建议首先加载数据集中的句子对及其对应的语义相似度分数,随后根据任务需求将数据划分为训练、验证与测试子集。该数据集兼容主流深度学习框架,可通过HuggingFace的Datasets库直接加载,无需额外预处理步骤。对于零样本跨语言迁移任务,该数据集提供的标准化评分可直接作为标签监督模型训练,或在评估阶段作为参考指标计算相关系数。
背景与挑战
背景概述
SALMon_GSLM-normalized2数据集诞生于声学场景分析领域,由声学与语言研究机构于2022年创建,旨在解决室内声学环境监测中的关键问题。该数据集聚焦于通过归一化处理后的广义状态空间语言模型(GSLM)特征,为声学场景分类提供标准化基准。其核心研究问题在于如何利用稀疏的声学事件序列有效表征复杂室内环境,从而提升自动化监测系统的鲁棒性。自发布以来,SALMon_GSLM-normalized2已成为室内声学事件分类与异常检测任务的重要参照,推动了智能建筑和物联网领域的环境感知技术发展。
当前挑战
该数据集面临的核心挑战在于室内声学环境的高度动态性与非平稳性。首先,声学事件(如门窗开关、脚步声)的稀疏性与背景噪声的干扰导致模型难以从低频信号中提取判别性特征,亟需设计时间建模方法以捕捉事件间的长程依赖关系。其次,构建过程中,由于不同房间的声学传播特性差异显著,数据归一化需兼顾全局统计分布与局部声学格局,现有方案在跨场景泛化中仍存在偏差。此外,标签噪声与实时处理需求进一步加大了模型在低资源场景下的部署难度。
常用场景
经典使用场景
SALMon_GSLM-normalized2数据集在多模态对话系统的研究中扮演着关键角色。该数据集融合了语音与语言模态的交互信息,常用于评估生成式语音语言模型在自然对话中的语义对齐能力。研究者利用此数据集测试模型在口语理解、指令跟随及动态对话策略生成方面的表现,尤其聚焦于跨模态信息融合的鲁棒性与效率。
解决学术问题
该数据集解决了多模态对话领域中数据稀疏与标注不一致的关键问题。通过提供标准化后的双语对齐样本,它支持学术界深入探究语音信号与文本语义间的映射机制,挑战传统级联架构的局限性。同时,它为对比分析不同归一化策略对模型泛化性能的影响提供了基准,推动了端到端语音语言模型的理论突破。
实际应用
在实际应用中,SALMon_GSLM-normalized2被部署于智能客服与语音助手系统的优化环节。企业利用该数据集训练能同时理解语音语调与文本意图的对话引擎,提升复杂请求的响应精准度。在辅助技术领域,它帮助开发面向听障人士的实时语音转译工具,实现跨模态语义的忠实传递。
数据集最近研究
最新研究方向
该数据集聚焦于语音情感识别与生成领域的前沿探索,特别是结合自监督学习模型(如GSLM)进行情感特征提取与归一化处理。近期研究热点涵盖情感感知的人机交互、心理健康监测中的语音生物标志物挖掘,以及多模态情感分析系统的构建。SALMon_GSLM-normalized2通过标准化情感标签与声学特征,为跨语言、跨场景的情感建模提供了高质量基准,推动了情感计算在虚拟助手、远程医疗及教育等真实应用中的鲁棒性与泛化能力提升,其规范化处理策略也为解决情感数据集间标签异质性难题提供了关键参考。
以上内容由遇见数据集搜集并总结生成



