Layers and parameters of the 2DM-CNN model.

Figshare2024-04-26 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Layers_and_parameters_of_the_2DM-CNN_model_/25703768

下载链接

链接失效反馈

官方服务：

资源简介：

Digital speech recognition is a challenging problem that requires the ability to learn complex signal characteristics such as frequency, pitch, intensity, timbre, and melody, which traditional methods often face issues in recognizing. This article introduces three solutions based on convolutional neural networks (CNN) to solve the problem: 1D-CNN is designed to learn directly from digital data; 2DS-CNN and 2DM-CNN have a more complex architecture, transferring raw waveform into transformed images using Fourier transform to learn essential features. Experimental results on four large data sets, containing 30,000 samples for each, show that the three proposed models achieve superior performance compared to well-known models such as GoogLeNet and AlexNet, with the best accuracy of 95.87%, 99.65%, and 99.76%, respectively. With 5-10% higher performance than other models, the proposed solution has demonstrated the ability to effectively learn features, improve recognition accuracy and speed, and open up the potential for broad applications in virtual assistants, medical recording, and voice commands.

数字语音识别是一项极具挑战性的任务，其要求模型能够学习频率、基频、强度、音色以及旋律等复杂的信号特征，而传统方法往往难以实现此类特征的准确识别。本文提出了三种基于卷积神经网络（Convolutional Neural Network, CNN）的解决方案以攻克该难题：一维卷积神经网络（1D-CNN）被设计为直接从数字数据中学习特征；二维频谱卷积神经网络（2DS-CNN）与二维梅尔频谱卷积神经网络（2DM-CNN）的架构更为复杂，二者通过傅里叶变换将原始波形转换为变换图像，以此学习关键特征。在四个各包含30000个样本的大型数据集上开展的实验结果表明，所提出的三种模型相较于GoogLeNet与AlexNet等知名模型均展现出更优性能，其最优准确率分别可达95.87%、99.65%与99.76%。相较于其他模型，所提方案的性能提升了5%至10%，其不仅证实了能够有效学习特征、提升识别准确率与识别速度，更为其在虚拟助手、医疗录音以及语音命令等领域的广泛应用开辟了潜力空间。

创建时间：

2024-04-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集