Weighted factor automata: A finite-state framework for spoken content retrieval

Mendeley Data2024-01-31 更新2024-06-27 收录

下载链接：

https://digitallibrary.usc.edu/asset-management/2A3BF1W4433K

下载链接

链接失效反馈

官方服务：

资源简介：

Spoken Content Retrieval (SCR) integrates Automatic Speech Recognition (ASR) and Information Retrieval (IR) to provide access to large multimedia archives based on their contents. There are several tasks of varying difficulty that fall under the SCR umbrella. Among them, Keyword Search (KWS) is one of the harder tasks, where the goal is to locate exact matches to an open vocabulary query term in a large heterogenous speech corpus. The retrieval operation is required to be fast, so the data must be indexed ahead of time for fast search. Since ASR transcripts are often highly erroneous in real world scenarios due to model weaknesses, especially in languages and domains where supervised resources are limited, all of these requirements must be met with imperfect information about which words occur where in the corpus. ❧ We present an efficient, flexible and theoretically-sound framework for SCR based on weighted finite-state transducers. While we mainly focus on the challenging KWS task, the algorithms and representations we propose are applicable in a wide variety of scenarios where the inputs can be represented as lattices, i.e. acyclic weighted finite-state automata. Our contributions include i) novel techniques for indexing and searching a collection of ASR lattices for KWS, ii) a new algorithm for computing and indexing exact posterior probabilities for all substrings in a lattice, iii) a recipe for computing and indexing probabilistic generalizations of statistics widely used in IR, such as term frequency (TF), inverse document frequency (IDF) and TF-IDF, for all substrings in a collection of lattices, iv) a new algorithm for computing and indexing posterior weighted alignments between substrings in a time aligned reference string and substrings in an ASR lattice, and v) a novel approach for performing open vocabulary KWS by explicitly modeling ASR errors and redistributing lattice-based posterior estimates based on sub-word level confusions.

语音内容检索（Spoken Content Retrieval, SCR）结合自动语音识别（Automatic Speech Recognition, ASR）与信息检索（Information Retrieval, IR）技术，实现基于内容访问大规模多媒体档案的功能。SCR范畴下包含多项难度各异的任务，其中关键词检索（Keyword Search, KWS）属于较具挑战性的任务之一：其目标是在大规模异构语音语料库中定位开放词汇查询词的精确匹配结果。由于检索环节需具备高效性，因此需预先对数据构建索引以支持快速搜索。在真实应用场景中，由于模型存在固有缺陷，尤其是在监督资源匮乏的语言与领域中，自动语音识别的转录结果往往存在大量错误，因此所有检索需求均需在语料库词位置信息不完美的前提下达成。 ❧ 本文提出了一种高效、灵活且具备理论合理性的基于加权有限状态换能器（weighted finite-state transducers）的语音内容检索框架。尽管研究主要聚焦于极具挑战的关键词检索任务，但我们所提出的算法与表示方法可广泛应用于各类输入可建模为格（lattices），即无环加权有限状态自动机的场景。本文的主要贡献包括：i）面向关键词检索任务，提出用于索引与搜索自动语音识别语料格格集的全新技术；ii）提出可计算并索引语料格格集中所有子串精确后验概率的新算法；iii）提出可针对语料格格集中的所有子串，计算并索引信息检索领域广泛使用的统计量（如词频（term frequency, TF）、逆文档频率（inverse document frequency, IDF）以及TF-IDF）的概率泛化形式的实现方案；iv）提出可计算并索引时间对齐参考字符串中子串与自动语音识别语料格格中子串之间后验加权对齐关系的新算法；v）提出一种通过显式建模自动语音识别错误，并基于子词级（sub-word level）混淆度重新分配基于语料格格的后验估计值的开放词汇关键词检索新方法。

创建时间：

2024-01-31

5,000+

优质数据集

54 个

任务类型

进入经典数据集