five

Attention2Probability

收藏
魔搭社区2025-12-04 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/ByteDance/Attention2Probability
下载链接
链接失效反馈
官方服务:
资源简介:
# Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-Text System <p align="center"> <a href="https://arxiv.org/abs/2508.18701" alt="paper"><img src="https://img.shields.io/badge/Paper-A2P-blue?logo=arxiv&logoColor=white"/></a> <a href="https://huggingface.co/ByteDance/Attention2Probability" alt="Model"><img src="https://img.shields.io/badge/Model-A2P-yellow?logo=huggingface"/></a> <a href="https://huggingface.co/datasets/ByteDance/Attention2Probability" alt="Dataset"><img src="https://img.shields.io/badge/Dataset-A2P-yellow?logo=huggingface"/></a> Attention2Probability (A2P) is a lightweight intervention scheme for speech terminology. The core approach is to use the cross-attention mechanism to retrieve the terms that may appear in the audio and add these terms to the prompt of the llm to complete the term intervention. ## Data description This project does not provide audio data for librispeech and aishell2. Please download them from other addresses. All the training data is provided in the data_json folder. The prefix path needs to be modified before use. ## Training step For English, the LibriSpeech dataset should first be utilized for pre-training. Subsequently, the second-stage training on LibriSpeech can be conducted by modifying the settings in the dataset configuration. For Chinese, retrieving a single character in isolation lacks practical significance; thus, the Retriever can be directly trained using the Aishell-2 dataset. Finally, the models for both languages are fine-tuned on real-world data. ## Citation If you find A2P useful, please cite the paper: ``` @misc{du2025attention2probabilityattentiondriventerminologyprobability, title={Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-Text System}, author={Yanfan Du and Jun Zhang and Bin Wang and Jin Qiu and Lu Huang and Yuan Ge and Xiaoqian Liu and Tong Xiao and Jingbo Zhu}, year={2025}, eprint={2508.18701}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.18701}, } ```

# Attention2Probability:面向鲁棒语音转文字系统的注意力驱动术语概率估计 <p align="center"> <a href="https://arxiv.org/abs/2508.18701" alt="paper"><img src="https://img.shields.io/badge/Paper-A2P-blue?logo=arxiv&logoColor=white"/></a> <a href="https://huggingface.co/ByteDance/Attention2Probability" alt="Model"><img src="https://img.shields.io/badge/Model-A2P-yellow?logo=huggingface"/></a> <a href="https://huggingface.co/datasets/ByteDance/Attention2Probability" alt="Dataset"><img src="https://img.shields.io/badge/Dataset-A2P-yellow?logo=huggingface"/></a> Attention2Probability(A2P)是一种面向语音术语的轻量级干预方案。其核心思路为利用交叉注意力机制检索音频中可能出现的术语,并将这些术语添加至大语言模型(LLM)的提示词中,以完成术语干预。 ## 数据集说明 本项目不提供LibriSpeech与Aishell-2的音频数据,请从其他渠道下载。所有训练数据均已存放于data_json文件夹中,使用前需修改路径前缀。 ## 训练流程 针对英语场景,需首先利用LibriSpeech数据集开展预训练;随后可通过修改数据集配置文件中的参数,在LibriSpeech数据集上进行第二阶段训练。针对中文场景,单独检索单个汉字并无实际应用价值,因此可直接使用Aishell-2数据集训练检索器(Retriever)。最终,针对两种语言的模型均需在真实场景数据上进行微调。 ## 引用 若您认为A2P对您的研究有所帮助,请引用如下论文: @misc{du2025attention2probabilityattentiondriventerminologyprobability, title={Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-Text System}, author={Yanfan Du and Jun Zhang and Bin Wang and Jin Qiu and Lu Huang and Yuan Ge and Xiaoqian Liu and Tong Xiao and Jingbo Zhu}, year={2025}, eprint={2508.18701}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.18701}, }
提供机构:
maas
创建时间:
2025-08-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作