MTUCI/RuASD
收藏Hugging Face2026-03-31 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/MTUCI/RuASD
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- ru
tags:
- audio
- speech
- anti-spoofing
- audio-deepfake-detection
- tts
task_categories:
- audio-classification
pretty_name: RuASD
size_categories:
- 100K<n<1M
license: cc-by-nc-sa-4.0
---
RuASD: Russian Anti-Spoofing Dataset
**RuASD** is a public Russian-language speech anti-spoofing dataset designed for developing and benchmarking audio deepfake detection systems. It combines spoofed utterances generated by 37 Russian-capable speech synthesis systems with bona fide recordings curated from multiple heterogeneous Russian speech corpora. In addition to clean audio, the dataset supports robustness-oriented evaluation through reproducible perturbations such as reverberation, additive noise, and codec-based channel degradation.
**Models:** ESpeech, F5-TTS, VITS, Piper, TeraTTS, MMS TTS, VITS2, GPT-SoVITS, CoquiTTS, XTSS, Fastpitch, RussianFastSpeech, Bark, GradTTS, FishTTS, Pyttsx3, RHVoice, Silero, Fairseq Transformer, SpeechT5, Vosk-TTS, EdgeTTS, VK Cloud, SaluteSpeech, ElevenLabs
# Overview
- **Purpose:** Benchmark and develop Russian-language anti-spoofing and audio deepfake detection systems, with a focus on robustness to realistic channel and post-processing distortions.
- **Content:** Bona fide speech from multiple open Russian speech corpora and synthetic speech generated by 37 Russian-capable TTS and voice-cloning systems.
- **Structure:**
- **Audio:** `.wav` files
- **Metadata:** JSON with the fields `sample_id`, `label`, `group`, `subset`, `augmentation`, `filename`, `audio_relpath`, `source_audio`, `metadata_source`, `source_type`, `mos_pred`, `noi_pred`, `dis_pred`, `col_pred`, `loud_pred`, `cer`, `duration`, `speakers`, `model`, `transcribe`, `true_lines`, `transcription`, `ground_truth`, and `ops`.
| Field | Description |
| ----------------- | -------------------------------------------------------------------------------------------------------------------- |
| `sample_id` | Sample ID |
| `label` | `real` or `fake` |
| `group` | Sample group - `raw` or `augmented` |
| `subset` | source subset name, e.g. `OpenSTT`, `GOLOS`, or `ElevenLabs` |
| `augmentation` | Applied augmentation |
| `filename` | Audio filename |
| `audio_relpath` | Relative path to audio |
| `source_audio` | Original audio for augmented sample |
| `metadata_source` | Metadata source |
| `source_type` | Source type - `tts`, `real_speech` or `augmented_audio` |
| `mos_pred` | Predicted MOS |
| `noi_pred` | Predicted noisiness |
| `dis_pred` | Predicted discontinuity |
| `col_pred` | Predicted coloration |
| `loud_pred` | Predicted loudness |
| `cer` | Character error rate |
| `duration` | Duration in seconds |
| `speakers` | Speaker info |
| `model` | specific checkpoint or voice used for generation, e.g. `ESpeech-TTS-1_RL-V1`, `xtts-ru-ipa`, or `ru-RU-DmitryNeural` |
| `transcribe` | Automatic transcription |
| `true_lines` | Source text |
| `transcription` | Automatic transcription |
| `ground_truth` | Reference text |
| `ops` | Processing operations |
# Statistics
- **Number of TTS systems:** 37
- **Total spoof hours:** 691.68
- **Total bona-fide hours:** 234.07
Table 4. Antispoofing models on clean data
| Model | Acc | Pr | Rec | F1 | RAUC | EER | t-DCF |
| ------------------------------------------------------------------------ | ------------------ | ------------------ | ------------------ | ------------------ | ------------------- | ------------------ | ------------------ |
| [AASIST3](https://huggingface.co/MTUCI/AASIST3) | 0.769±0.0006 | 0.683±0.001 | 0.769±0.0006 | 0.724±0.001 | 0.841±0.0006 | 0.231±0.0006 | 0.702±0.002 |
| [Arena-1B](https://huggingface.co/Speech-Arena-2025/DF_Arena_1B_V_1) | 0.812±0.001 | 0.736±0.001 | 0.812±0.001 | 0.772±0.001 | 0.887±0.0005 | 0.188±0.001 | <u>0.385±0.001</u> |
| [Arena-500M](https://huggingface.co/Speech-Arena-2025/DF_Arena_500M_V_1) | 0.801±0.001 | 0.722±0.001 | 0.801±0.001 | 0.760±0.001 | 0.864±0.0005 | 0.199±0.001 | 0.655±0.002 |
| [Nes2Net](https://github.com/Liu-Tianchi/Nes2Net) | 0.689±0.0007 | 0.589±0.001 | 0.689±0.0007 | 0.634±0.0008 | 0.779±0.0007 | 0.311±0.0007 | 0.696±0.001 |
| [Res2TCNGaurd](https://github.com/mtuciru/Res2TCNGuard) | 0.627±0.001 | 0.520±0.001 | 0.627±0.001 | 0.569±0.001 | 0.691±0.001 | 0.373±0.001 | 0.918±0.001 |
| [ResCapsGuard](https://github.com/mtuciru/ResCapsGuard) | 0.677±0.001 | 0.575±0.001 | 0.677±0.001 | 0.622±0.001 | 0.718±0.001 | 0.323±0.001 | 0.896±0.001 |
| [SLS with XLS-R](https://github.com/QiShanZhang/SLSforASVspoof-2021-DF) | 0.779±0.001 | 0.700±0.001 | 0.779±0.001 | 0.737±0.001 | 0.859±0.001 | 0.221±0.001 | 0.650±0.001 |
| [Wav2Vec 2.0](https://github.com/TakHemlata/SSL_Anti-spoofing) | 0.772±0.0006 | 0.687±0.001 | 0.772±0.0006 | 0.727±0.001 | 0.850±0.0006 | 0.228±0.0006 | 0.558±0.002 |
| [TCM-ADD](https://github.com/ductuantruong/tcm_add) | <u>0.857±0.001</u> | <u>0.797±0.001</u> | <u>0.859±0.001</u> | <u>0.827±0.001</u> | <u>0.914±0.0004</u> | <u>0.143±0.001</u> | 0.424±0.001 |
| [Spectra-0](https://huggingface.co/MTUCI/spectra_0) | **0.962** | **0.942** | **0.962** | **0.952** | **0.985** | **0.038** | **0.124** |
# Download
## Using Datasets
```python
from datasets import load_dataset
ds = load_dataset("MTUCI/RuASD")
print(ds)
```
## Using Datasets with streaming mode
```python
from datasets import load_dataset
ds = load_dataset("MTUCI/RuASD", streaming=True)
small_ds = ds.take(1000)
print(small_ds)
```
# Contact
- **Email:** [k.n.borodin@mtuci.ru](mailto:k.n.borodin@mtuci.ru)
- **Telegram channel:** [https://t.me/korallll_ai](https://t.me/korallll_ai)
# Citation
```
@unpublished{ruasd2026,
author = {},
title = {},
year = {}
}
```
# TTS and VC models
| Model | Link |
| --------------------- | -------------------------------------------------------------------------- |
| Espeech Podcaster | https://hf.co/ESpeech/ESpeech-TTS-1_podcaster |
| Espeech RL-V1 | https://hf.co/ESpeech/ESpeech-TTS-1_RL-V1 |
| Espeech RL-V2 | https://hf.co/ESpeech/ESpeech-TTS-1_RL-V1 |
| Espeech SFT-95k | https://hf.co/ESpeech/ESpeech-TTS-1_SFT-95K |
| Espeech SFT-256k | https://hf.co/ESpeech/ESpeech-TTS-1_SFT-256K |
| F5-TTS checkpoint | https://hf.co/Misha24-10/F5-TTS_RUSSIAN |
| F5-TTS checkpoint | https://hf.co/hotstone228/F5-TTS-Russian |
| VITS checkpoint | https://hf.co/joefox/tts_vits_ru_hf |
| PiperTTS | https://github.com/rhasspy/piper |
| TeraTTS-natasha | https://hf.co/TeraTTS/natasha-g2p-vits |
| TeraTTS-girl_nice | https://hf.co/TeraTTS/girl_nice-g2p-vits |
| TeraTTS-glados | https://hf.co/TeraTTS/glados-g2p-vits |
| TeraTTS-glados2 | https://hf.co/TeraTTS/glados2-g2p-vits |
| MMS | https://hf.co/facebook/mms-tts-rus |
| VITS checkpoint | https://hf.co/utrobinmv/tts_ru_free_hf_vits_low_multispeaker |
| VITS checkpoint | https://hf.co/utrobinmv/tts_ru_free_hf_vits_high_multispeaker |
| VITS2 checkpoint | https://hf.co/frappuccino/vits2_ru_natasha |
| GPT-SoVITS checkpoint | https://hf.co/alphacep/vosk-tts-ru-gpt-sovits |
| CoquiTTS | https://hf.co/coqui/XTTS-v2 |
| XTTS checkpoint | https://hf.co/NeuroDonu/RU-XTTS-DonuModel |
| XTTS checkpoint | https://hf.co/omogr/xtts-ru-ipa |
| Fastpitch IPA | https://hf.co/bene-ges/tts_ru_ipa_fastpitch_ruslan |
| Fastpitch BERT g2p | https://hf.co/bene-ges/ru_g2p_ipa_bert_large |
| RussianFastPitch | https://github.com/safonovanastya/RussianFastPitch |
| Bark | https://hf.co/suno/bark-small |
| GradTTS | https://github.com/huawei-noah/Speech-Backbones/tree/main/Grad-TTS |
| FishTTS | https://hf.co/fishaudio/fish-speech-1.5 |
| Pyttsx3 | https://github.com/nateshmbhat/pyttsx3 |
| RHVoice | https://github.com/RHVoice/RHVoice |
| Silero | https://github.com/snakers4/silero-models |
| Fairseq Transformer | https://hf.co/facebook/tts_transformer-ru-cv7_css10 |
| SpeechT5 | https://hf.co/voxxer/speecht5_finetuned_commonvoice_ru_translit |
| Vosk-TTS | https://github.com/alphacep/vosk-tts |
| EdgeTTS | https://github.com/rany2/edge-tts |
| VK Cloud | https://cloud.vk.com/ |
| SaluteSpeech | https://developers.sber.ru/portal/products/smartspeech |
| ElevenLabs | https://elevenlabs.io/ |
---
语言:
- 俄语
标签:
- 音频
- 语音
- 反欺骗
- 音频深度伪造检测
- 文本转语音(TTS)
任务类别:
- 音频分类
数据集名称: RuASD
样本规模:
- 100000 < 样本数 < 1000000
许可协议: CC BY-NC-SA 4.0
---
# RuASD:俄语反欺骗数据集
**RuASD** 是一款面向俄语的开源语音反欺骗数据集,旨在用于开发与评测音频深度伪造检测系统。该数据集将37款支持俄语的语音合成系统生成的伪造语音,与从多源异构俄语语音语料库中精选的真实语音进行融合。除纯净音频外,数据集还支持通过可复现的扰动(如混响、加性噪声、基于编解码器的信道退化)开展面向鲁棒性的评测。
**所用合成模型**:ESpeech、F5-TTS、VITS、Piper、TeraTTS、MMS TTS、VITS2、GPT-SoVITS、CoquiTTS、XTSS、Fastpitch、RussianFastSpeech、Bark、GradTTS、FishTTS、Pyttsx3、RHVoice、Silero、Fairseq Transformer、SpeechT5、Vosk-TTS、EdgeTTS、VK Cloud、SaluteSpeech、ElevenLabs
## 数据集概览
- **核心用途**:开发并评测俄语语音反欺骗与音频深度伪造检测系统,重点针对真实信道与后处理失真场景下的鲁棒性评测。
- **数据集内容**:来自多个开源俄语语音语料库的真实语音,以及37款支持俄语的文本转语音(TTS)与语音克隆系统生成的合成语音。
- **数据结构**:
- **音频文件**:格式为`.wav`的音频文件
- **元数据**:采用JSON格式存储,包含以下字段:`sample_id`、`label`、`group`、`subset`、`augmentation`、`filename`、`audio_relpath`、`source_audio`、`metadata_source`、`source_type`、`mos_pred`、`noi_pred`、`dis_pred`、`col_pred`、`loud_pred`、`cer`、`duration`、`speakers`、`model`、`transcribe`、`true_lines`、`transcription`、`ground_truth`与`ops`。
### 元数据字段说明
| 字段名 | 说明 |
| ----------------- | -------------------------------------------------------------------------------------------------------------------- |
| `sample_id` | 样本ID |
| `label` | 标签,取值为`real`(真实语音)或`fake`(伪造语音) |
| `group` | 样本分组,取值为`raw`(原始样本)或`augmented`(增强样本) |
| `subset` | 源子集名称,例如`OpenSTT`、`GOLOS`或`ElevenLabs` |
| `augmentation` | 应用的数据增强方式 |
| `filename` | 音频文件名 |
| `audio_relpath` | 音频文件相对路径 |
| `source_audio` | 增强样本对应的原始音频 |
| `metadata_source` | 元数据来源 |
| `source_type` | 数据来源类型,取值为`tts`(文本转语音生成)、`real_speech`(真实语音)或`augmented_audio`(增强音频) |
| `mos_pred` | 预测平均意见得分 |
| `noi_pred` | 预测噪声水平 |
| `dis_pred` | 预测不连续性 |
| `col_pred` | 预测音色特征 |
| `loud_pred` | 预测响度 |
| `cer` | 字符错误率 |
| `duration` | 音频时长(单位:秒) |
| `speakers` | 说话人信息 |
| `model` | 生成所用的模型 checkpoint 或语音音色,例如`ESpeech-TTS-1_RL-V1`、`xtts-ru-ipa`或`ru-RU-DmitryNeural` |
| `transcribe` | 自动语音识别转录结果 |
| `true_lines` | 源文本 |
| `transcription` | 自动语音识别转录结果 |
| `ground_truth` | 参考文本 |
| `ops` | 数据处理操作序列 |
## 数据集统计信息
- **所用TTS系统数量**:37款
- **伪造语音总时长**:691.68小时
- **真实语音总时长**:234.07小时
### 表4 纯净数据集上的反欺骗模型性能
| 模型 | 准确率(Acc) | 精确率(Pr) | 召回率(Rec) | F1值(F1) | 相对曲线下面积(RAUC) | 等错误率(EER) | 归一化检测代价函数(t-DCF) |
| ------------------------------------------------------------------------ | ------------------ | ------------------ | ------------------ | ------------------ | ------------------- | ------------------ | ------------------ |
| [AASIST3](https://huggingface.co/MTUCI/AASIST3) | 0.769±0.0006 | 0.683±0.001 | 0.769±0.0006 | 0.724±0.001 | 0.841±0.0006 | 0.231±0.0006 | 0.702±0.002 |
| [Arena-1B](https://huggingface.co/Speech-Arena-2025/DF_Arena_1B_V_1) | 0.812±0.001 | 0.736±0.001 | 0.812±0.001 | 0.772±0.001 | 0.887±0.0005 | 0.188±0.001 | <u>0.385±0.001</u> |
| [Arena-500M](https://huggingface.co/Speech-Arena-2025/DF_Arena_500M_V_1) | 0.801±0.001 | 0.722±0.001 | 0.801±0.001 | 0.760±0.001 | 0.864±0.0005 | 0.199±0.001 | 0.655±0.002 |
| [Nes2Net](https://github.com/Liu-Tianchi/Nes2Net) | 0.689±0.0007 | 0.589±0.001 | 0.689±0.0007 | 0.634±0.0008 | 0.779±0.0007 | 0.311±0.0007 | 0.696±0.001 |
| [Res2TCNGaurd](https://github.com/mtuciru/Res2TCNGuard) | 0.627±0.001 | 0.520±0.001 | 0.627±0.001 | 0.569±0.001 | 0.691±0.001 | 0.373±0.001 | 0.918±0.001 |
| [ResCapsGuard](https://github.com/mtuciru/ResCapsGuard) | 0.677±0.001 | 0.575±0.001 | 0.677±0.001 | 0.622±0.001 | 0.718±0.001 | 0.323±0.001 | 0.896±0.001 |
| [SLS with XLS-R](https://github.com/QiShanZhang/SLSforASVspoof-2021-DF) | 0.779±0.001 | 0.700±0.001 | 0.779±0.001 | 0.737±0.001 | 0.859±0.001 | 0.221±0.001 | 0.650±0.001 |
| [Wav2Vec 2.0](https://github.com/TakHemlata/SSL_Anti-spoofing) | 0.772±0.0006 | 0.687±0.001 | 0.772±0.0006 | 0.727±0.001 | 0.850±0.0006 | 0.228±0.0006 | 0.558±0.002 |
| [TCM-ADD](https://github.com/ductuantruong/tcm_add) | <u>0.857±0.001</u> | <u>0.797±0.001</u> | <u>0.859±0.001</u> | <u>0.827±0.001</u> | <u>0.914±0.0004</u> | <u>0.143±0.001</u> | 0.424±0.001 |
| [Spectra-0](https://huggingface.co/MTUCI/spectra_0) | **0.962** | **0.942** | **0.962** | **0.952** | **0.985** | **0.038** | **0.124** |
## 数据集下载
### 使用`datasets`库加载
python
from datasets import load_dataset
ds = load_dataset("MTUCI/RuASD")
print(ds)
### 流式加载模式
python
from datasets import load_dataset
ds = load_dataset("MTUCI/RuASD", streaming=True)
small_ds = ds.take(1000)
print(small_ds)
## 联系方式
- **邮箱**:[k.n.borodin@mtuci.ru](mailto:k.n.borodin@mtuci.ru)
- **Telegram频道**:[https://t.me/korallll_ai](https://t.me/korallll_ai)
## 引用格式
@unpublished{ruasd2026,
author = {},
title = {},
year = {}
}
## 文本转语音与语音克隆模型列表
| 模型名称 | 链接 |
| --------------------- | -------------------------------------------------------------------------- |
| Espeech Podcaster | https://hf.co/ESpeech/ESpeech-TTS-1_podcaster |
| Espeech RL-V1 | https://hf.co/ESpeech/ESpeech-TTS-1_RL-V1 |
| Espeech RL-V2 | https://hf.co/ESpeech/ESpeech-TTS-1_RL-V1 |
| Espeech SFT-95k | https://hf.co/ESpeech/ESpeech-TTS-1_SFT-95K |
| Espeech SFT-256k | https://hf.co/ESpeech/ESpeech-TTS-1_SFT-256K |
| F5-TTS checkpoint | https://hf.co/Misha24-10/F5-TTS_RUSSIAN |
| F5-TTS checkpoint | https://hf.co/hotstone228/F5-TTS-Russian |
| VITS checkpoint | https://hf.co/joefox/tts_vits_ru_hf |
| PiperTTS | https://github.com/rhasspy/piper |
| TeraTTS-natasha | https://hf.co/TeraTTS/natasha-g2p-vits |
| TeraTTS-girl_nice | https://hf.co/TeraTTS/girl_nice-g2p-vits |
| TeraTTS-glados | https://hf.co/TeraTTS/glados-g2p-vits |
| TeraTTS-glados2 | https://hf.co/TeraTTS/glados2-g2p-vits |
| MMS | https://hf.co/facebook/mms-tts-rus |
| VITS checkpoint | https://hf.co/utrobinmv/tts_ru_free_hf_vits_low_multispeaker |
| VITS checkpoint | https://hf.co/utrobinmv/tts_ru_free_hf_vits_high_multispeaker |
| VITS2 checkpoint | https://hf.co/frappuccino/vits2_ru_natasha |
| GPT-SoVITS checkpoint | https://hf.co/alphacep/vosk-tts-ru-gpt-sovits |
| CoquiTTS | https://hf.co/coqui/XTTS-v2 |
| XTTS checkpoint | https://hf.co/NeuroDonu/RU-XTTS-DonuModel |
| XTTS checkpoint | https://hf.co/omogr/xtts-ru-ipa |
| Fastpitch IPA | https://hf.co/bene-ges/tts_ru_ipa_fastpitch_ruslan |
| Fastpitch BERT g2p | https://hf.co/bene-ges/ru_g2p_ipa_bert_large |
| RussianFastPitch | https://github.com/safonovanastya/RussianFastPitch |
| Bark | https://hf.co/suno/bark-small |
| GradTTS | https://github.com/huawei-noah/Speech-Backbones/tree/main/Grad-TTS |
| FishTTS | https://hf.co/fishaudio/fish-speech-1.5 |
| Pyttsx3 | https://github.com/nateshmbhat/pyttsx3 |
| RHVoice | https://github.com/RHVoice/RHVoice |
| Silero | https://github.com/snakers4/silero-models |
| Fairseq Transformer | https://hf.co/facebook/tts_transformer-ru-cv7_css10 |
| SpeechT5 | https://hf.co/voxxer/speecht5_finetuned_commonvoice_ru_translit |
| Vosk-TTS | https://github.com/alphacep/vosk-tts |
| EdgeTTS | https://github.com/rany2/edge-tts |
| VK Cloud | https://cloud.vk.com/ |
| SaluteSpeech | https://developers.sber.ru/portal/products/smartspeech |
| ElevenLabs | https://elevenlabs.io/ |
提供机构:
MTUCI
搜集汇总
数据集介绍

构建方式
在语音安全领域,俄罗斯反欺骗数据集RuASD的构建体现了对多样化数据源的精心整合。该数据集汇集了来自多个公开俄语语音语料库的真实语音样本,并利用37种支持俄语的语音合成与语音克隆系统生成伪造语音。通过系统性地结合原始音频与经过可复现扰动处理的增强样本,如混响、加性噪声及编解码器引入的信道失真,数据集构建过程确保了评估场景的全面性与鲁棒性。
特点
RuASD数据集展现出多维度特征,其核心在于覆盖广泛的语音合成技术,囊括了从传统参数合成到前沿神经语音克隆的多种系统。数据集不仅提供纯净音频,还包含丰富的元数据,涵盖样本标识、真伪标签、来源子集、增强处理信息以及语音质量预测指标,为深入分析模型行为提供了结构化支持。此外,其规模达到近百万样本量级,总时长超过九百小时,为大规模模型训练与基准测试奠定了坚实基础。
使用方法
研究人员可通过Hugging Face的datasets库便捷加载RuASD数据集,支持流式读取以高效处理海量音频文件。数据集适用于训练与评估音频深度伪造检测模型,用户可依据元数据中的标签与分组信息划分训练、验证及测试集。其内置的增强样本支持鲁棒性测试,使模型能够在模拟真实环境失真的条件下进行性能验证,推动俄语反欺骗技术向实用化迈进。
背景与挑战
背景概述
随着深度伪造技术的迅猛发展,语音合成与克隆系统生成的音频在身份验证、媒体内容等领域构成严峻安全威胁。为应对这一挑战,俄罗斯反欺骗数据集RuASD应运而生,由莫斯科电信与信息技术大学等机构的研究团队于近期构建。该数据集旨在为俄语语音反欺骗与深度伪造检测系统提供标准化基准,其核心研究问题聚焦于提升检测模型在真实信道干扰与后处理失真下的鲁棒性。通过整合来自多个异构俄语语音语料库的真实录音,以及涵盖37种俄语语音合成系统生成的伪造语音,RuASD为相关领域的研究与评估奠定了重要数据基础,推动了音频安全技术的进步。
当前挑战
在音频深度伪造检测领域,主要挑战在于区分高度逼真的合成语音与真实人声,尤其是在面对多样化的语音合成模型、复杂的信道条件以及后处理操作时,模型的泛化能力与鲁棒性面临严峻考验。RuASD数据集的构建过程同样充满挑战:首先,需要广泛收集并协调多个异构的俄语真实语音语料库,确保数据来源的合法性与代表性;其次,集成多达37种不同的语音合成与克隆系统以生成伪造样本,涉及复杂的模型适配与数据生成流程;此外,为模拟真实应用场景,数据集还需系统性地引入可复现的扰动,如混响、加性噪声和编解码器失真,这要求精心的实验设计与质量控制。
常用场景
经典使用场景
在语音安全领域,RuASD数据集为俄语语音反欺骗研究提供了基准测试平台。该数据集整合了来自37种俄语语音合成系统的伪造语音与多个真实俄语语音语料库的纯净录音,广泛应用于训练和评估音频深度伪造检测模型。研究人员利用其丰富的语音样本和可复现的扰动设置,系统性地检验模型在区分真实与合成语音方面的性能,尤其在应对混响、噪声和编解码失真等现实场景挑战时展现出重要价值。
实际应用
在实际应用中,RuASD数据集为俄语区域的语音安全系统开发提供了重要支撑。金融机构和身份验证服务可借助基于该数据集训练的模型,识别电话诈骗或语音仿冒攻击。媒体平台也能利用相关技术核查音频内容的真实性,防止虚假信息的传播。其包含的多种合成语音样本模拟了现实世界中可能遇到的欺骗手段,使得部署的系统能够更有效地应对新兴的音频伪造威胁。
衍生相关工作
围绕RuASD数据集,已衍生出一系列经典的语音反欺骗研究工作。例如,Spectra-0模型在该数据集上取得了卓越的检测性能,展示了先进架构在俄语场景下的潜力。同时,AASIST3、Wav2Vec 2.0等模型也基于此进行了适应性评估与改进,推动了跨语言反欺骗技术的比较与融合。这些工作不仅丰富了俄语语音安全的研究图谱,也为全球性的音频深度伪造检测挑战提供了新的解决方案与见解。
以上内容由遇见数据集搜集并总结生成



