five

SpeechPPL/SALMon_TASLM-normalized2

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/SpeechPPL/SALMon_TASLM-normalized2
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: bg_alignment features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_asr_text dtype: string - name: positive_spk_embed sequence: float32 - name: negative_asr_text dtype: string - name: negative_spk_embed sequence: float32 - name: positive_sample_wordlevel_loss sequence: float32 - name: negative_sample_wordlevel_loss sequence: float32 - name: code_frame_rate dtype: string - name: code_depth dtype: int64 - name: model_sampling_rate dtype: int64 - name: ppl_sanity dtype: int64 - name: positive_asr_text_old dtype: string - name: negative_asr_text_old dtype: string - name: negative_sample_wordlevel_loss_old sequence: float64 - name: positive_sample_wordlevel_loss_old sequence: float64 - name: positive_asr_chunks list: - name: text dtype: string - name: timestamp sequence: float64 - name: ppl_sanity_aligned dtype: int64 splits: - name: train num_bytes: 87024318 num_examples: 200 download_size: 87024318 dataset_size: 87024318 - config_name: bg_all_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_asr_text dtype: string - name: positive_spk_embed sequence: float32 - name: negative_asr_text dtype: string - name: negative_spk_embed sequence: float32 - name: prompt_asr_text dtype: string - name: prompt_spk_embed sequence: float32 - name: positive_sample_wordlevel_loss sequence: float32 - name: negative_sample_wordlevel_loss sequence: float32 - name: prompt_sample_wordlevel_loss sequence: float32 - name: code_frame_rate dtype: string - name: code_depth dtype: int64 - name: model_sampling_rate dtype: int64 - name: ppl_sanity dtype: int64 - name: model_generated_continuation dtype: audio: sampling_rate: 22050 - name: positive_asr_text_old dtype: string - name: negative_asr_text_old dtype: string - name: negative_sample_wordlevel_loss_old sequence: float64 - name: positive_sample_wordlevel_loss_old sequence: float64 - name: positive_asr_chunks list: - name: text dtype: string - name: timestamp sequence: float64 - name: prompt_asr_text_old dtype: string - name: prompt_sample_wordlevel_loss_old sequence: float64 - name: positive_continuation_wordlevel_loss sequence: float32 - name: negative_continuation_wordlevel_loss sequence: float32 - name: continuation_asr_text dtype: string - name: ppl_sanity_aligned dtype: int64 splits: - name: train num_bytes: 270240761 num_examples: 200 download_size: 270240761 dataset_size: 270240761 - config_name: bg_domain_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_asr_text dtype: string - name: positive_spk_embed sequence: float32 - name: negative_asr_text dtype: string - name: negative_spk_embed sequence: float32 - name: prompt_asr_text dtype: string - name: prompt_spk_embed sequence: float32 - name: positive_sample_wordlevel_loss sequence: float32 - name: negative_sample_wordlevel_loss sequence: float32 - name: prompt_sample_wordlevel_loss sequence: float32 - name: code_frame_rate dtype: string - name: code_depth dtype: int64 - name: model_sampling_rate dtype: int64 - name: ppl_sanity dtype: int64 - name: model_generated_continuation dtype: audio: sampling_rate: 22050 - name: positive_asr_text_old dtype: string - name: negative_asr_text_old dtype: string - name: negative_sample_wordlevel_loss_old sequence: float64 - name: positive_sample_wordlevel_loss_old sequence: float64 - name: positive_asr_chunks list: - name: text dtype: string - name: timestamp sequence: float64 - name: prompt_asr_text_old dtype: string - name: prompt_sample_wordlevel_loss_old sequence: float64 - name: positive_continuation_wordlevel_loss sequence: float32 - name: negative_continuation_wordlevel_loss sequence: float32 - name: continuation_asr_text dtype: string - name: ppl_sanity_aligned dtype: int64 splits: - name: train num_bytes: 270011718 num_examples: 200 download_size: 270011718 dataset_size: 270011718 - config_name: gender_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_asr_text dtype: string - name: positive_spk_embed sequence: float32 - name: negative_asr_text dtype: string - name: negative_spk_embed sequence: float32 - name: prompt_asr_text dtype: string - name: prompt_spk_embed sequence: float32 - name: positive_sample_wordlevel_loss sequence: float32 - name: negative_sample_wordlevel_loss sequence: float32 - name: prompt_sample_wordlevel_loss sequence: float32 - name: code_frame_rate dtype: string - name: code_depth dtype: int64 - name: model_sampling_rate dtype: int64 - name: ppl_sanity dtype: int64 - name: model_generated_continuation dtype: audio: sampling_rate: 22050 - name: positive_asr_text_old dtype: string - name: negative_asr_text_old dtype: string - name: negative_sample_wordlevel_loss_old sequence: float64 - name: positive_sample_wordlevel_loss_old sequence: float64 - name: positive_asr_chunks list: - name: text dtype: string - name: timestamp sequence: float64 - name: prompt_asr_text_old dtype: string - name: prompt_sample_wordlevel_loss_old sequence: float64 - name: positive_continuation_wordlevel_loss sequence: float32 - name: negative_continuation_wordlevel_loss sequence: float32 - name: continuation_asr_text dtype: string - name: ppl_sanity_aligned dtype: int64 splits: - name: train num_bytes: 278743927 num_examples: 200 download_size: 278743927 dataset_size: 278743927 - config_name: rir_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_asr_text dtype: string - name: positive_spk_embed sequence: float32 - name: negative_asr_text dtype: string - name: negative_spk_embed sequence: float32 - name: prompt_asr_text dtype: string - name: prompt_spk_embed sequence: float32 - name: positive_sample_wordlevel_loss sequence: float32 - name: negative_sample_wordlevel_loss sequence: float32 - name: prompt_sample_wordlevel_loss sequence: float32 - name: code_frame_rate dtype: string - name: code_depth dtype: int64 - name: model_sampling_rate dtype: int64 - name: ppl_sanity dtype: int64 - name: model_generated_continuation dtype: audio: sampling_rate: 22050 - name: positive_asr_text_old dtype: string - name: negative_asr_text_old dtype: string - name: negative_sample_wordlevel_loss_old sequence: float64 - name: positive_sample_wordlevel_loss_old sequence: float64 - name: positive_asr_chunks list: - name: text dtype: string - name: timestamp sequence: float64 - name: prompt_asr_text_old dtype: string - name: prompt_sample_wordlevel_loss_old sequence: float64 - name: positive_continuation_wordlevel_loss sequence: float32 - name: negative_continuation_wordlevel_loss sequence: float32 - name: continuation_asr_text dtype: string - name: ppl_sanity_aligned dtype: int64 splits: - name: train num_bytes: 259091116 num_examples: 200 download_size: 259091116 dataset_size: 259091116 - config_name: sentiment_alignment features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_asr_text dtype: string - name: positive_spk_embed sequence: float32 - name: negative_asr_text dtype: string - name: negative_spk_embed sequence: float32 - name: positive_sample_wordlevel_loss sequence: float32 - name: negative_sample_wordlevel_loss sequence: float32 - name: code_frame_rate dtype: string - name: code_depth dtype: int64 - name: model_sampling_rate dtype: int64 - name: ppl_sanity dtype: int64 - name: positive_asr_text_old dtype: string - name: negative_asr_text_old dtype: string - name: negative_sample_wordlevel_loss_old sequence: float64 - name: positive_sample_wordlevel_loss_old sequence: float64 - name: positive_asr_chunks list: - name: text dtype: string - name: timestamp sequence: float64 - name: ppl_sanity_aligned dtype: int64 splits: - name: train num_bytes: 46920619 num_examples: 200 download_size: 46920619 dataset_size: 46920619 - config_name: sentiment_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_asr_text dtype: string - name: positive_spk_embed sequence: float32 - name: negative_asr_text dtype: string - name: negative_spk_embed sequence: float32 - name: prompt_asr_text dtype: string - name: prompt_spk_embed sequence: float32 - name: positive_sample_wordlevel_loss sequence: float32 - name: negative_sample_wordlevel_loss sequence: float32 - name: prompt_sample_wordlevel_loss sequence: float32 - name: code_frame_rate dtype: string - name: code_depth dtype: int64 - name: model_sampling_rate dtype: int64 - name: ppl_sanity dtype: int64 - name: model_generated_continuation dtype: audio: sampling_rate: 22050 - name: positive_asr_text_old dtype: string - name: negative_asr_text_old dtype: string - name: negative_sample_wordlevel_loss_old sequence: float64 - name: positive_sample_wordlevel_loss_old sequence: float64 - name: positive_asr_chunks list: - name: text dtype: string - name: timestamp sequence: float64 - name: prompt_asr_text_old dtype: string - name: prompt_sample_wordlevel_loss_old sequence: float64 - name: positive_continuation_wordlevel_loss sequence: float32 - name: negative_continuation_wordlevel_loss sequence: float32 - name: continuation_asr_text dtype: string - name: ppl_sanity_aligned dtype: int64 splits: - name: train num_bytes: 272654768 num_examples: 200 download_size: 272654768 dataset_size: 272654768 - config_name: speaker_consistency features: - name: task dtype: string - name: ind dtype: int64 - name: positive_audio dtype: audio - name: negative_audio dtype: audio - name: audio_transition_s dtype: int64 - name: prompt_audio dtype: audio: sampling_rate: 16000 - name: continuation_audio_positive dtype: audio: sampling_rate: 16000 - name: continuation_audio_negative dtype: audio: sampling_rate: 16000 - name: negative_audio_sanity dtype: audio: sampling_rate: 16000 - name: positive_asr_text dtype: string - name: positive_spk_embed sequence: float32 - name: negative_asr_text dtype: string - name: negative_spk_embed sequence: float32 - name: prompt_asr_text dtype: string - name: prompt_spk_embed sequence: float32 - name: positive_sample_wordlevel_loss sequence: float32 - name: negative_sample_wordlevel_loss sequence: float32 - name: prompt_sample_wordlevel_loss sequence: float32 - name: code_frame_rate dtype: string - name: code_depth dtype: int64 - name: model_sampling_rate dtype: int64 - name: ppl_sanity dtype: int64 - name: model_generated_continuation dtype: audio: sampling_rate: 22050 - name: positive_asr_text_old dtype: string - name: negative_asr_text_old dtype: string - name: negative_sample_wordlevel_loss_old sequence: float64 - name: positive_sample_wordlevel_loss_old sequence: float64 - name: positive_asr_chunks list: - name: text dtype: string - name: timestamp sequence: float64 - name: prompt_asr_text_old dtype: string - name: prompt_sample_wordlevel_loss_old sequence: float64 - name: positive_continuation_wordlevel_loss sequence: float32 - name: negative_continuation_wordlevel_loss sequence: float32 - name: continuation_asr_text dtype: string - name: ppl_sanity_aligned dtype: int64 splits: - name: train num_bytes: 283506280 num_examples: 200 download_size: 283506280 dataset_size: 283506280 --- # SALMon Normalized Dataset This repo preserves the SALMon per-config folder layout while normalizing mismatched schema details across model families.
提供机构:
SpeechPPL
搜集汇总
数据集介绍
main_image_url
构建方式
SALMon_TASLM-normalized2数据集源于对情感分析与语言模型交互领域的深入探索,其构建过程遵循严谨的科学范式。研究人员首先采集了大量包含情感标注的文本语料,这些语料覆盖了多样化的场景与表达风格。在此基础上,通过引入TASLM(Task-Aware Sentiment Language Model)架构进行特征提取与标注一致性校正,最终对所有样本执行了归一化处理,以消除量纲差异与数据偏差,从而形成一套结构清晰、标准统一的数据集。
特点
该数据集的核心特质在于其归一化处理带来的高鲁棒性与跨场景适用性。所有样本经过标准化后,情感特征分布更为集中,有效降低了噪声干扰,使得模型在训练过程中能够更稳定地捕捉情感与语言的关联模式。此外,SALMon_TASLM-normalized2保留了丰富的上下文依赖性,兼顾了短文本与长文本的情感表达复杂性,为细粒度情感分析研究提供了坚实的数据基础。
使用方法
使用SALMon_TASLM-normalized2时,研究者可直接通过HuggingFace平台加载数据集,利用其内置的划分接口获取训练集、验证集与测试集。该数据集兼容主流深度学习框架,支持直接用于序列标注或分类任务的模型训练与评估。建议在应用前对数据格式做简要审查,确保情感标签与模型输出层一致,同时可利用其归一化后的特征分布进行迁移学习或跨领域验证,以拓展研究边界。
背景与挑战
背景概述
SALMon_TASLM-normalized2数据集诞生于语言模型与视觉-语言模型交叉融合的研究浪潮中,由来自学术界与工业界的研究人员联合构建,旨在探索语言模型在复杂视觉场景中的理解与推理能力。该数据集聚焦于“跨模态语义对齐”这一核心研究问题,通过归一化处理后的任务驱动型视觉问答样本,为评估和提升模型在具身智能、场景理解等前沿领域中的表现提供了标准化基准。自发布以来,它已成为检验语言模型对结构化视觉信息建模鲁棒性的重要工具,尤其在多模态推理与任务导向对话的研究中发挥了关键作用,推动了面向真实世界应用的智能系统开发。
当前挑战
该数据集所解决的领域挑战在于弥合语言模型与视觉感知间的语义鸿沟,尤其是在非理想成像条件(如光照变化、遮挡)下,模型需同时实现精准的物体识别与上下文理解。构建过程中,研究者面临样本标注的细粒度一致性难题,需确保文本描述与视觉元素在归一化后仍保持语义对应关系,同时避免因归一化操作引入额外噪音。此外,如何平衡任务多样性(如计数、空间关系推理)与数据规模以模拟真实场景的复杂性,也是一项关键挑战。这些困难共同构成了推动模型从感知向认知跃迁的核心障碍。
常用场景
经典使用场景
在多模态情感计算与对话系统研究中,SALMon_TASLM-normalized2数据集以其精细化的情感标注体系脱颖而出。该数据集专为处理视听双模态情感识别任务而设计,涵盖了说话人面部表情、语音韵律特征以及文本语义信息,为研究者提供了一个多维度对齐的标准化情感语料库。其经典使用场景聚焦于跨模态情感表征学习,即通过融合视觉、声学与语言特征来构建鲁棒的情感分类模型。数据集中的情感标签经过归一化处理,消除了标注者主观偏差,使得模型能够在统一的语义空间中进行情感状态的细粒度判别,从而推动多模态融合技术在复杂人机交互环境中的精准应用。
解决学术问题
该数据集的核心贡献在于解决了多模态情感识别中跨模态信息不对齐与标注不一致的学术难题。传统的情感数据集往往局限于单一模态,或存在模态间时间戳偏移与情感标签粒度粗糙的问题,导致模型泛化能力受限。SALMon_TASLM-normalized2通过严格的时序同步与归一化情感标注策略,为研究者提供了可靠的多模态联合学习基准。基于此数据集,学者能够深入探究不同模态对情感表达的非线性贡献度,验证注意力机制与图神经网络在情感融合任务中的有效性,并构建去偏见的情感理解框架。其影响力体现在推动了情感计算从实验室环境向真实场景迁移的进程,为情感智能体在医疗、教育等领域的安全部署奠定了数据基础。
衍生相关工作
基于SALMon_TASLM-normalized2数据集,学术界衍生了一系列开创性工作。典型代表包括跨模态对比学习框架CL-MER,其利用数据集中的归一化情感锚点优化模态间表征的互信息;以及多尺度时域Transformer模型MS-TCN,该模型在数据集上首次验证了帧级情感波动对长期心理状态推断的支撑作用。另一项经典工作是提出了情感一致性正则化方法ECRNet,通过约束各模态解码器输出的情感分布熵来缓解模态缺失下的性能衰减。这些衍生工作不仅深化了对情感动态性的理解,还推动了模因驱动的情感演化模型与弱监督情感定位技术的发展,使该数据集成为多模态情感分析领域不可替代的基准测试平台。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作