Korean Voice Phishing Detection Dataset with Multilingual Back-Translation and SMOTE Augmentations

Name: Korean Voice Phishing Detection Dataset with Multilingual Back-Translation and SMOTE Augmentations
Creator: IEEE DataPort
Published: 2024-11-11 10:03:28
License: 暂无描述

DataCite Commons2024-11-11 更新2025-04-16 收录

下载链接：

https://ieee-dataport.org/documents/korean-voice-phishing-detection-dataset-multilingual-back-translation-and-smote

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset contains original and augmented versions of the Korean Call Content Vishing (KorCCVi v2) dataset used in the study titled, "Enhancing Voice Phishing Detection Using Multilingual Back-Translation and SMOTE: An Empirical Study." The dataset addresses challenges of data imbalance and asymmetry in Korean voice phishing detection, leveraging data augmentation techniques such as multilingual back-translation (BT) with English, Chinese, and Japanese as intermediate languages, and Synthetic Minority Oversampling Technique (SMOTE). The augmented dataset provides a valuable resource for machine learning (ML) and deep learning (DL) applications in natural language processing (NLP) and cybersecurity research. 

本数据集包含题为《基于多语言回译与合成少数类过采样技术（Synthetic Minority Oversampling Technique, SMOTE）的语音钓鱼检测增强：一项实证研究》的研究中所使用的韩语通话内容语音钓鱼（Korean Call Content Vishing，简称KorCCVi v2）数据集的原始版本与增强版本。本数据集针对韩语语音钓鱼检测任务中存在的数据不平衡与数据非对称挑战，采用了以英语、中文、日语作为中间语言的多语言回译（multilingual back-translation, BT）以及合成少数类过采样技术（Synthetic Minority Oversampling Technique, SMOTE）等数据增强手段。该增强数据集可为自然语言处理（Natural Language Processing, NLP）与网络安全领域的机器学习（Machine Learning, ML）、深度学习（Deep Learning, DL）相关应用提供宝贵的研究资源。

提供机构：

IEEE DataPort

创建时间：

2024-11-11

搜集汇总

数据集介绍