Attention2Probability
收藏魔搭社区2025-12-04 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/ByteDance/Attention2Probability
下载链接
链接失效反馈官方服务:
资源简介:
# Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-Text System
<p align="center">
<a href="https://arxiv.org/abs/2508.18701" alt="paper"><img src="https://img.shields.io/badge/Paper-A2P-blue?logo=arxiv&logoColor=white"/></a>
<a href="https://huggingface.co/ByteDance/Attention2Probability" alt="Model"><img src="https://img.shields.io/badge/Model-A2P-yellow?logo=huggingface"/></a>
<a href="https://huggingface.co/datasets/ByteDance/Attention2Probability" alt="Dataset"><img src="https://img.shields.io/badge/Dataset-A2P-yellow?logo=huggingface"/></a>
Attention2Probability (A2P) is a lightweight intervention scheme for speech terminology. The core approach is to use the cross-attention mechanism to retrieve the terms that may appear in the audio and add these terms to the prompt of the llm to complete the term intervention.
## Data description
This project does not provide audio data for librispeech and aishell2. Please download them from other addresses. All the training data is provided in the data_json folder. The prefix path needs to be modified before use.
## Training step
For English, the LibriSpeech dataset should first be utilized for pre-training. Subsequently, the second-stage training on LibriSpeech can be conducted by modifying the settings in the dataset configuration.
For Chinese, retrieving a single character in isolation lacks practical significance; thus, the Retriever can be directly trained using the Aishell-2 dataset. Finally, the models for both languages are fine-tuned on real-world data.
## Citation
If you find A2P useful, please cite the paper:
```
@misc{du2025attention2probabilityattentiondriventerminologyprobability,
title={Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-Text System},
author={Yanfan Du and Jun Zhang and Bin Wang and Jin Qiu and Lu Huang and Yuan Ge and Xiaoqian Liu and Tong Xiao and Jingbo Zhu},
year={2025},
eprint={2508.18701},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.18701},
}
```
# Attention2Probability:面向鲁棒语音转文字系统的注意力驱动术语概率估计
<p align="center">
<a href="https://arxiv.org/abs/2508.18701" alt="paper"><img src="https://img.shields.io/badge/Paper-A2P-blue?logo=arxiv&logoColor=white"/></a>
<a href="https://huggingface.co/ByteDance/Attention2Probability" alt="Model"><img src="https://img.shields.io/badge/Model-A2P-yellow?logo=huggingface"/></a>
<a href="https://huggingface.co/datasets/ByteDance/Attention2Probability" alt="Dataset"><img src="https://img.shields.io/badge/Dataset-A2P-yellow?logo=huggingface"/></a>
Attention2Probability(A2P)是一种面向语音术语的轻量级干预方案。其核心思路为利用交叉注意力机制检索音频中可能出现的术语,并将这些术语添加至大语言模型(LLM)的提示词中,以完成术语干预。
## 数据集说明
本项目不提供LibriSpeech与Aishell-2的音频数据,请从其他渠道下载。所有训练数据均已存放于data_json文件夹中,使用前需修改路径前缀。
## 训练流程
针对英语场景,需首先利用LibriSpeech数据集开展预训练;随后可通过修改数据集配置文件中的参数,在LibriSpeech数据集上进行第二阶段训练。针对中文场景,单独检索单个汉字并无实际应用价值,因此可直接使用Aishell-2数据集训练检索器(Retriever)。最终,针对两种语言的模型均需在真实场景数据上进行微调。
## 引用
若您认为A2P对您的研究有所帮助,请引用如下论文:
@misc{du2025attention2probabilityattentiondriventerminologyprobability,
title={Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-Text System},
author={Yanfan Du and Jun Zhang and Bin Wang and Jin Qiu and Lu Huang and Yuan Ge and Xiaoqian Liu and Tong Xiao and Jingbo Zhu},
year={2025},
eprint={2508.18701},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.18701},
}
提供机构:
maas
创建时间:
2025-08-28



