Multi-Keyboard Acoustic (MKA) Datasets

Name: Multi-Keyboard Acoustic (MKA) Datasets
Creator: Mendeley Data
Published: 2025-05-01 06:46:30
License: 暂无描述

DataCite Commons2025-05-01 更新2025-05-17 收录

下载链接：

https://data.mendeley.com/datasets/bpt2hvf8n3

下载链接

链接失效反馈

官方服务：

资源简介：

Our research team from the Computer Science Department at the University of Halabja has developed an innovative dataset collection named the Multi-Keyboard Acoustic (MKA) Datasets. The Multi-Keyboard Acoustic (MKA) Datasets, designed to aid in keyboard sound recognition and analysis, address the critical need for defending against acoustic-based cyber threats. With the increasing sophistication of cyberattacks, focusing on keyboard acoustics is particularly timely. The MKA Datasets encompass detailed recordings from six commonly used platforms: HP, Lenovo, MSI, Mac, Messenger, and Zoom. Each platform's dataset includes raw recordings, segmented sound files, and matrices derived from these sounds, capturing the subtle variations in typing behavior across different devices and applications. We meticulously organize the MKA datasets to facilitate ease of use and thorough analysis. Each platform has a dedicated folder containing subfolders for raw data, segmented sound files, and matrices. Additionally, an aggregated folder combines data from all platforms, providing a broad spectrum for cross-platform analysis. In total, the MKA datasets consist of around 2630 files with.wav extensions for sound segments, as well as an equal number of matrix and.txt files. The number of files varies by platform, with approximately 70 files for HP, Lenovo, MSI, Zoom, and Messenger, and 61 files for Mac. Within each platform's dataset, the "Sound segments" folder stores six one-second WAV audio excerpts derived from the corresponding raw data files for each class, renamed using a convention of "class_name+1" to "class_name+6" for each platform individually and "class_name+platform_name1" to "class_name+platform_name6" for the aggregated datasets. The "Sound segment (.matrix)" folder contains feature representations, such as MFCCs, extracted from each sound segment. Additionally, the "Sound segment metadata (.txt)" folder holds detailed information for each sound segment, including recording conditions, platform information, and keystroke class labels. Beyond cybersecurity, the MKA datasets have potential applications in domains such as speech recognition and natural language processing. The datasets, which provide a diverse set of sound profiles, support the development of more robust and adaptable algorithms in these fields. The versatility of the MKA datasets makes them an invaluable tool not only for advancing cybersecurity research, but also for improving the efficiency and accuracy of human-computer interaction technologies. Through our comprehensive approach, we aim to contribute significantly to both academic research and practical applications in these interconnected areas.

来自哈莱卜杰大学计算机科学系的研究团队，开发了一款创新的数据集集合，命名为多键盘声学（Multi-Keyboard Acoustic, MKA）数据集。该数据集旨在助力键盘声音识别与分析研究，可应对基于声学的网络威胁这一关键需求。随着网络攻击日趋复杂，聚焦键盘声学的研究恰逢其时。MKA数据集涵盖了六个常用平台的详细录音：惠普（HP）、联想（Lenovo）、微星（MSI）、苹果电脑（Mac）、信使（Messenger）以及Zoom。每个平台的数据集均包含原始录音、分段音频文件以及由这些音频生成的特征矩阵，能够捕捉不同设备与应用场景下打字行为的细微差异。我们对MKA数据集进行了精细化组织，以保障其易用性与全面分析的可行性。每个平台都设有专属文件夹，其中包含原始数据、分段音频文件以及特征矩阵三个子文件夹。此外，还设有聚合文件夹，整合了所有平台的数据，可为跨平台分析提供广泛的样本支撑。总体而言，MKA数据集包含约2630个.wav格式的音频分段文件，以及数量与之相当的矩阵文件和.txt文本文件。各平台的文件数量存在差异：惠普、联想、微星、Zoom以及信使平台约各有70个文件，苹果电脑平台则有61个文件。在每个平台的数据集中，“音频分段”文件夹存储了从对应原始数据文件中提取的六段1秒WAV音频片段；各平台内部的片段采用“类别名+1”至“类别名+6”的命名规范，而聚合数据集的片段则采用“类别名+平台名1”至“类别名+平台名6”的命名规则。“音频分段（.matrix）”文件夹存放了从每个音频分段中提取的特征表示，例如梅尔频率倒谱系数（MFCCs）。此外，“音频分段元数据（.txt）”文件夹包含了每个音频分段的详细信息，包括录音条件、平台信息以及按键类别标签。除网络安全领域外，MKA数据集在语音识别与自然语言处理等领域也具备潜在应用价值。该数据集提供了多样化的声音特征库，可助力开发更具鲁棒性与适应性的相关算法。MKA数据集的多功能性使其成为一项宝贵工具，不仅可推动网络安全研究的进步，还能提升人机交互技术的效率与准确率。通过这套全面的数据集方案，我们旨在为上述关联领域的学术研究与实际应用做出显著贡献。

提供机构：

Mendeley Data

创建时间：

2024-06-11

搜集汇总

数据集介绍