Techniques for manipulation detection in speech signal using machine learning and acoustic signal processing

Name: Techniques for manipulation detection in speech signal using machine learning and acoustic signal processing
Creator: Thammasat University
Published: 2023-09-27 07:30:58
License: 暂无描述

DataCite Commons2023-09-27 更新2025-04-16 收录

下载链接：

http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/TU.the.2022.815

下载链接

链接失效反馈

官方服务：

资源简介：

Speech signals are adopted in various forms and many social applications in a cyber-physical system (CPS), such as voice command, voice activation, and voice recognition. However, high-end speech editing software, such as voice conversion techniques and speech synthesis software, makes anyone easily fabricate and alter speech signals. These misused of this technology create risk in the security of speech technology and lead to social problems according to the increasing of unauthenticated speeches. These unauthenticated speeches can be used for criminal purposes such as theft or fraud in any systems in CPS. The attacks of unauthenticated speech signals, such as tampered speech, spoofed speech, and modified speech are considered an emerging threat. Thus, it is necessary to provide security of speech signals. Cryptography is a classical method that provides security by concealing speech signals from being tampered with and modified. However, cryptography does not detect tampering and modification in speech signals. This research focuses on techniques for detecting the manipulation of speech signals using machine learning and acoustic signal processing.This research aims to provide security for speech signals in two objectives. The first objective is security in terms of protecting the genuineness of the speech signal. If attackers try to modify or change the speech signal, AIH can be used to protect its genuineness by tampering detection. One crucial property of information hiding is that the hidden information should be difficult to remove from the watermarked signal, and if there are attacks performed on the watermarked signal, the hidden information should reflect that change. The second objective is to provide security in automatic speaker verification using spoof detection. The spoof detection eliminates the spoofed speech and only allows the authentic speech to be processed. This spoof detection raises the accuracy of the automatic speaker verification system. Based on literature reviews, several information hiding techniques have been previously developed, and the singular spectrum analysis (SSA)-based AIH showed its strength in robustness due to the invariance of the singular spectrum. Moreover, SSA-based AIH could be designed to gain semi-fragile property (robust against non-malicious attacks but fragile to malicious attacks) by properly selecting part of the singular spectrum to be modified. The possibility of semi-fragile in SSA-based AIH motivates to construct a scheme for tampering detection. In addition, we deployed the convolutional neural network (CNN) method for parameter estimation instead of the differential evolution-based method adopted in the original SSA-based AIH. For the first objective, the experimental results showed that the proposed scheme could locate tampered areas correctly, and it could also predict the types and degrees of tampering roughly. CNN-based parameter estimation could significantly reduce computational time, and the scheme is entirely blind because the estimation could be used to suggest the parameters in both embedding and extraction processes. However, the tampering detection accuracy needs to be improved since the proposed scheme fragile to MP4 and robust to echo adding. For the second objective, we focus on security in speech signals by providing spoof detection in an automatic speaker verification system.In this work, we investigate the spoof detection performances when using different percentages of voice and non-voice. Mel-frequency cepstral coefficients and linear frequency cepstral coefficients are calculated from the optimal section as a feature, and the ResNet-34 model is used for classification.

提供机构：

Thammasat University

创建时间：

2023-09-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集