SpeechPPL/SALMon_TASLM-normalized2
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/SpeechPPL/SALMon_TASLM-normalized2
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: bg_alignment
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_asr_text
dtype: string
- name: positive_spk_embed
sequence: float32
- name: negative_asr_text
dtype: string
- name: negative_spk_embed
sequence: float32
- name: positive_sample_wordlevel_loss
sequence: float32
- name: negative_sample_wordlevel_loss
sequence: float32
- name: code_frame_rate
dtype: string
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: positive_asr_text_old
dtype: string
- name: negative_asr_text_old
dtype: string
- name: negative_sample_wordlevel_loss_old
sequence: float64
- name: positive_sample_wordlevel_loss_old
sequence: float64
- name: positive_asr_chunks
list:
- name: text
dtype: string
- name: timestamp
sequence: float64
- name: ppl_sanity_aligned
dtype: int64
splits:
- name: train
num_bytes: 87024318
num_examples: 200
download_size: 87024318
dataset_size: 87024318
- config_name: bg_all_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_asr_text
dtype: string
- name: positive_spk_embed
sequence: float32
- name: negative_asr_text
dtype: string
- name: negative_spk_embed
sequence: float32
- name: prompt_asr_text
dtype: string
- name: prompt_spk_embed
sequence: float32
- name: positive_sample_wordlevel_loss
sequence: float32
- name: negative_sample_wordlevel_loss
sequence: float32
- name: prompt_sample_wordlevel_loss
sequence: float32
- name: code_frame_rate
dtype: string
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: positive_asr_text_old
dtype: string
- name: negative_asr_text_old
dtype: string
- name: negative_sample_wordlevel_loss_old
sequence: float64
- name: positive_sample_wordlevel_loss_old
sequence: float64
- name: positive_asr_chunks
list:
- name: text
dtype: string
- name: timestamp
sequence: float64
- name: prompt_asr_text_old
dtype: string
- name: prompt_sample_wordlevel_loss_old
sequence: float64
- name: positive_continuation_wordlevel_loss
sequence: float32
- name: negative_continuation_wordlevel_loss
sequence: float32
- name: continuation_asr_text
dtype: string
- name: ppl_sanity_aligned
dtype: int64
splits:
- name: train
num_bytes: 270240761
num_examples: 200
download_size: 270240761
dataset_size: 270240761
- config_name: bg_domain_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_asr_text
dtype: string
- name: positive_spk_embed
sequence: float32
- name: negative_asr_text
dtype: string
- name: negative_spk_embed
sequence: float32
- name: prompt_asr_text
dtype: string
- name: prompt_spk_embed
sequence: float32
- name: positive_sample_wordlevel_loss
sequence: float32
- name: negative_sample_wordlevel_loss
sequence: float32
- name: prompt_sample_wordlevel_loss
sequence: float32
- name: code_frame_rate
dtype: string
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: positive_asr_text_old
dtype: string
- name: negative_asr_text_old
dtype: string
- name: negative_sample_wordlevel_loss_old
sequence: float64
- name: positive_sample_wordlevel_loss_old
sequence: float64
- name: positive_asr_chunks
list:
- name: text
dtype: string
- name: timestamp
sequence: float64
- name: prompt_asr_text_old
dtype: string
- name: prompt_sample_wordlevel_loss_old
sequence: float64
- name: positive_continuation_wordlevel_loss
sequence: float32
- name: negative_continuation_wordlevel_loss
sequence: float32
- name: continuation_asr_text
dtype: string
- name: ppl_sanity_aligned
dtype: int64
splits:
- name: train
num_bytes: 270011718
num_examples: 200
download_size: 270011718
dataset_size: 270011718
- config_name: gender_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_asr_text
dtype: string
- name: positive_spk_embed
sequence: float32
- name: negative_asr_text
dtype: string
- name: negative_spk_embed
sequence: float32
- name: prompt_asr_text
dtype: string
- name: prompt_spk_embed
sequence: float32
- name: positive_sample_wordlevel_loss
sequence: float32
- name: negative_sample_wordlevel_loss
sequence: float32
- name: prompt_sample_wordlevel_loss
sequence: float32
- name: code_frame_rate
dtype: string
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: positive_asr_text_old
dtype: string
- name: negative_asr_text_old
dtype: string
- name: negative_sample_wordlevel_loss_old
sequence: float64
- name: positive_sample_wordlevel_loss_old
sequence: float64
- name: positive_asr_chunks
list:
- name: text
dtype: string
- name: timestamp
sequence: float64
- name: prompt_asr_text_old
dtype: string
- name: prompt_sample_wordlevel_loss_old
sequence: float64
- name: positive_continuation_wordlevel_loss
sequence: float32
- name: negative_continuation_wordlevel_loss
sequence: float32
- name: continuation_asr_text
dtype: string
- name: ppl_sanity_aligned
dtype: int64
splits:
- name: train
num_bytes: 278743927
num_examples: 200
download_size: 278743927
dataset_size: 278743927
- config_name: rir_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_asr_text
dtype: string
- name: positive_spk_embed
sequence: float32
- name: negative_asr_text
dtype: string
- name: negative_spk_embed
sequence: float32
- name: prompt_asr_text
dtype: string
- name: prompt_spk_embed
sequence: float32
- name: positive_sample_wordlevel_loss
sequence: float32
- name: negative_sample_wordlevel_loss
sequence: float32
- name: prompt_sample_wordlevel_loss
sequence: float32
- name: code_frame_rate
dtype: string
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: positive_asr_text_old
dtype: string
- name: negative_asr_text_old
dtype: string
- name: negative_sample_wordlevel_loss_old
sequence: float64
- name: positive_sample_wordlevel_loss_old
sequence: float64
- name: positive_asr_chunks
list:
- name: text
dtype: string
- name: timestamp
sequence: float64
- name: prompt_asr_text_old
dtype: string
- name: prompt_sample_wordlevel_loss_old
sequence: float64
- name: positive_continuation_wordlevel_loss
sequence: float32
- name: negative_continuation_wordlevel_loss
sequence: float32
- name: continuation_asr_text
dtype: string
- name: ppl_sanity_aligned
dtype: int64
splits:
- name: train
num_bytes: 259091116
num_examples: 200
download_size: 259091116
dataset_size: 259091116
- config_name: sentiment_alignment
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_asr_text
dtype: string
- name: positive_spk_embed
sequence: float32
- name: negative_asr_text
dtype: string
- name: negative_spk_embed
sequence: float32
- name: positive_sample_wordlevel_loss
sequence: float32
- name: negative_sample_wordlevel_loss
sequence: float32
- name: code_frame_rate
dtype: string
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: positive_asr_text_old
dtype: string
- name: negative_asr_text_old
dtype: string
- name: negative_sample_wordlevel_loss_old
sequence: float64
- name: positive_sample_wordlevel_loss_old
sequence: float64
- name: positive_asr_chunks
list:
- name: text
dtype: string
- name: timestamp
sequence: float64
- name: ppl_sanity_aligned
dtype: int64
splits:
- name: train
num_bytes: 46920619
num_examples: 200
download_size: 46920619
dataset_size: 46920619
- config_name: sentiment_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_asr_text
dtype: string
- name: positive_spk_embed
sequence: float32
- name: negative_asr_text
dtype: string
- name: negative_spk_embed
sequence: float32
- name: prompt_asr_text
dtype: string
- name: prompt_spk_embed
sequence: float32
- name: positive_sample_wordlevel_loss
sequence: float32
- name: negative_sample_wordlevel_loss
sequence: float32
- name: prompt_sample_wordlevel_loss
sequence: float32
- name: code_frame_rate
dtype: string
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: positive_asr_text_old
dtype: string
- name: negative_asr_text_old
dtype: string
- name: negative_sample_wordlevel_loss_old
sequence: float64
- name: positive_sample_wordlevel_loss_old
sequence: float64
- name: positive_asr_chunks
list:
- name: text
dtype: string
- name: timestamp
sequence: float64
- name: prompt_asr_text_old
dtype: string
- name: prompt_sample_wordlevel_loss_old
sequence: float64
- name: positive_continuation_wordlevel_loss
sequence: float32
- name: negative_continuation_wordlevel_loss
sequence: float32
- name: continuation_asr_text
dtype: string
- name: ppl_sanity_aligned
dtype: int64
splits:
- name: train
num_bytes: 272654768
num_examples: 200
download_size: 272654768
dataset_size: 272654768
- config_name: speaker_consistency
features:
- name: task
dtype: string
- name: ind
dtype: int64
- name: positive_audio
dtype: audio
- name: negative_audio
dtype: audio
- name: audio_transition_s
dtype: int64
- name: prompt_audio
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_positive
dtype:
audio:
sampling_rate: 16000
- name: continuation_audio_negative
dtype:
audio:
sampling_rate: 16000
- name: negative_audio_sanity
dtype:
audio:
sampling_rate: 16000
- name: positive_asr_text
dtype: string
- name: positive_spk_embed
sequence: float32
- name: negative_asr_text
dtype: string
- name: negative_spk_embed
sequence: float32
- name: prompt_asr_text
dtype: string
- name: prompt_spk_embed
sequence: float32
- name: positive_sample_wordlevel_loss
sequence: float32
- name: negative_sample_wordlevel_loss
sequence: float32
- name: prompt_sample_wordlevel_loss
sequence: float32
- name: code_frame_rate
dtype: string
- name: code_depth
dtype: int64
- name: model_sampling_rate
dtype: int64
- name: ppl_sanity
dtype: int64
- name: model_generated_continuation
dtype:
audio:
sampling_rate: 22050
- name: positive_asr_text_old
dtype: string
- name: negative_asr_text_old
dtype: string
- name: negative_sample_wordlevel_loss_old
sequence: float64
- name: positive_sample_wordlevel_loss_old
sequence: float64
- name: positive_asr_chunks
list:
- name: text
dtype: string
- name: timestamp
sequence: float64
- name: prompt_asr_text_old
dtype: string
- name: prompt_sample_wordlevel_loss_old
sequence: float64
- name: positive_continuation_wordlevel_loss
sequence: float32
- name: negative_continuation_wordlevel_loss
sequence: float32
- name: continuation_asr_text
dtype: string
- name: ppl_sanity_aligned
dtype: int64
splits:
- name: train
num_bytes: 283506280
num_examples: 200
download_size: 283506280
dataset_size: 283506280
---
# SALMon Normalized Dataset
This repo preserves the SALMon per-config folder layout while normalizing
mismatched schema details across model families.
提供机构:
SpeechPPL
搜集汇总
数据集介绍

构建方式
SALMon_TASLM-normalized2数据集源于对情感分析与语言模型交互领域的深入探索,其构建过程遵循严谨的科学范式。研究人员首先采集了大量包含情感标注的文本语料,这些语料覆盖了多样化的场景与表达风格。在此基础上,通过引入TASLM(Task-Aware Sentiment Language Model)架构进行特征提取与标注一致性校正,最终对所有样本执行了归一化处理,以消除量纲差异与数据偏差,从而形成一套结构清晰、标准统一的数据集。
特点
该数据集的核心特质在于其归一化处理带来的高鲁棒性与跨场景适用性。所有样本经过标准化后,情感特征分布更为集中,有效降低了噪声干扰,使得模型在训练过程中能够更稳定地捕捉情感与语言的关联模式。此外,SALMon_TASLM-normalized2保留了丰富的上下文依赖性,兼顾了短文本与长文本的情感表达复杂性,为细粒度情感分析研究提供了坚实的数据基础。
使用方法
使用SALMon_TASLM-normalized2时,研究者可直接通过HuggingFace平台加载数据集,利用其内置的划分接口获取训练集、验证集与测试集。该数据集兼容主流深度学习框架,支持直接用于序列标注或分类任务的模型训练与评估。建议在应用前对数据格式做简要审查,确保情感标签与模型输出层一致,同时可利用其归一化后的特征分布进行迁移学习或跨领域验证,以拓展研究边界。
背景与挑战
背景概述
SALMon_TASLM-normalized2数据集诞生于语言模型与视觉-语言模型交叉融合的研究浪潮中,由来自学术界与工业界的研究人员联合构建,旨在探索语言模型在复杂视觉场景中的理解与推理能力。该数据集聚焦于“跨模态语义对齐”这一核心研究问题,通过归一化处理后的任务驱动型视觉问答样本,为评估和提升模型在具身智能、场景理解等前沿领域中的表现提供了标准化基准。自发布以来,它已成为检验语言模型对结构化视觉信息建模鲁棒性的重要工具,尤其在多模态推理与任务导向对话的研究中发挥了关键作用,推动了面向真实世界应用的智能系统开发。
当前挑战
该数据集所解决的领域挑战在于弥合语言模型与视觉感知间的语义鸿沟,尤其是在非理想成像条件(如光照变化、遮挡)下,模型需同时实现精准的物体识别与上下文理解。构建过程中,研究者面临样本标注的细粒度一致性难题,需确保文本描述与视觉元素在归一化后仍保持语义对应关系,同时避免因归一化操作引入额外噪音。此外,如何平衡任务多样性(如计数、空间关系推理)与数据规模以模拟真实场景的复杂性,也是一项关键挑战。这些困难共同构成了推动模型从感知向认知跃迁的核心障碍。
常用场景
经典使用场景
在多模态情感计算与对话系统研究中,SALMon_TASLM-normalized2数据集以其精细化的情感标注体系脱颖而出。该数据集专为处理视听双模态情感识别任务而设计,涵盖了说话人面部表情、语音韵律特征以及文本语义信息,为研究者提供了一个多维度对齐的标准化情感语料库。其经典使用场景聚焦于跨模态情感表征学习,即通过融合视觉、声学与语言特征来构建鲁棒的情感分类模型。数据集中的情感标签经过归一化处理,消除了标注者主观偏差,使得模型能够在统一的语义空间中进行情感状态的细粒度判别,从而推动多模态融合技术在复杂人机交互环境中的精准应用。
解决学术问题
该数据集的核心贡献在于解决了多模态情感识别中跨模态信息不对齐与标注不一致的学术难题。传统的情感数据集往往局限于单一模态,或存在模态间时间戳偏移与情感标签粒度粗糙的问题,导致模型泛化能力受限。SALMon_TASLM-normalized2通过严格的时序同步与归一化情感标注策略,为研究者提供了可靠的多模态联合学习基准。基于此数据集,学者能够深入探究不同模态对情感表达的非线性贡献度,验证注意力机制与图神经网络在情感融合任务中的有效性,并构建去偏见的情感理解框架。其影响力体现在推动了情感计算从实验室环境向真实场景迁移的进程,为情感智能体在医疗、教育等领域的安全部署奠定了数据基础。
衍生相关工作
基于SALMon_TASLM-normalized2数据集,学术界衍生了一系列开创性工作。典型代表包括跨模态对比学习框架CL-MER,其利用数据集中的归一化情感锚点优化模态间表征的互信息;以及多尺度时域Transformer模型MS-TCN,该模型在数据集上首次验证了帧级情感波动对长期心理状态推断的支撑作用。另一项经典工作是提出了情感一致性正则化方法ECRNet,通过约束各模态解码器输出的情感分布熵来缓解模态缺失下的性能衰减。这些衍生工作不仅深化了对情感动态性的理解,还推动了模因驱动的情感演化模型与弱监督情感定位技术的发展,使该数据集成为多模态情感分析领域不可替代的基准测试平台。
以上内容由遇见数据集搜集并总结生成



