Name: lab260/RuASD
Creator: lab260
Published: 2026-03-31 20:05:23
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/lab260/RuASD

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - ru tags: - audio - speech - anti-spoofing - audio-deepfake-detection - tts task_categories: - audio-classification pretty_name: RuASD size_categories: - 100K<n<1M license: cc-by-nc-sa-4.0 --- RuASD: Russian Anti-Spoofing Dataset **RuASD** is a public Russian-language speech anti-spoofing dataset designed for developing and benchmarking audio deepfake detection systems. It combines spoofed utterances generated by 37 Russian-capable speech synthesis systems with bona fide recordings curated from multiple heterogeneous Russian speech corpora. In addition to clean audio, the dataset supports robustness-oriented evaluation through reproducible perturbations such as reverberation, additive noise, and codec-based channel degradation. **Models:** ESpeech, F5-TTS, VITS, Piper, TeraTTS, MMS TTS, VITS2, GPT-SoVITS, CoquiTTS, XTSS, Fastpitch, RussianFastSpeech, Bark, GradTTS, FishTTS, Pyttsx3, RHVoice, Silero, Fairseq Transformer, SpeechT5, Vosk-TTS, EdgeTTS, VK Cloud, SaluteSpeech, ElevenLabs # Overview - **Purpose:** Benchmark and develop Russian-language anti-spoofing and audio deepfake detection systems, with a focus on robustness to realistic channel and post-processing distortions. - **Content:** Bona fide speech from multiple open Russian speech corpora and synthetic speech generated by 37 Russian-capable TTS and voice-cloning systems. - **Structure:** - **Audio:** `.wav` files - **Metadata:** JSON with the fields `sample_id`, `label`, `group`, `subset`, `augmentation`, `filename`, `audio_relpath`, `source_audio`, `metadata_source`, `source_type`, `mos_pred`, `noi_pred`, `dis_pred`, `col_pred`, `loud_pred`, `cer`, `duration`, `speakers`, `model`, `transcribe`, `true_lines`, `transcription`, `ground_truth`, and `ops`. | Field | Description | | ----------------- | -------------------------------------------------------------------------------------------------------------------- | | `sample_id` | Sample ID | | `label` | `real` or `fake` | | `group` | Sample group - `raw` or `augmented` | | `subset` | source subset name, e.g. `OpenSTT`, `GOLOS`, or `ElevenLabs` | | `augmentation` | Applied augmentation | | `filename` | Audio filename | | `audio_relpath` | Relative path to audio | | `source_audio` | Original audio for augmented sample | | `metadata_source` | Metadata source | | `source_type` | Source type - `tts`, `real_speech` or `augmented_audio` | | `mos_pred` | Predicted MOS | | `noi_pred` | Predicted noisiness | | `dis_pred` | Predicted discontinuity | | `col_pred` | Predicted coloration | | `loud_pred` | Predicted loudness | | `cer` | Character error rate | | `duration` | Duration in seconds | | `speakers` | Speaker info | | `model` | specific checkpoint or voice used for generation, e.g. `ESpeech-TTS-1_RL-V1`, `xtts-ru-ipa`, or `ru-RU-DmitryNeural` | | `transcribe` | Automatic transcription | | `true_lines` | Source text | | `transcription` | Automatic transcription | | `ground_truth` | Reference text | | `ops` | Processing operations | # Statistics - **Number of TTS systems:** 37 - **Total spoof hours:** 691.68 - **Total bona-fide hours:** 234.07 Table 4. Antispoofing models on clean data | Model | Acc | Pr | Rec | F1 | RAUC | EER | t-DCF | | ------------------------------------------------------------------------ | ------------------ | ------------------ | ------------------ | ------------------ | ------------------- | ------------------ | ------------------ | | [AASIST3](https://huggingface.co/MTUCI/AASIST3) | 0.769±0.0006 | 0.683±0.001 | 0.769±0.0006 | 0.724±0.001 | 0.841±0.0006 | 0.231±0.0006 | 0.702±0.002 | | [Arena-1B](https://huggingface.co/Speech-Arena-2025/DF_Arena_1B_V_1) | 0.812±0.001 | 0.736±0.001 | 0.812±0.001 | 0.772±0.001 | 0.887±0.0005 | 0.188±0.001 | 0.385±0.001 | | [Arena-500M](https://huggingface.co/Speech-Arena-2025/DF_Arena_500M_V_1) | 0.801±0.001 | 0.722±0.001 | 0.801±0.001 | 0.760±0.001 | 0.864±0.0005 | 0.199±0.001 | 0.655±0.002 | | [Nes2Net](https://github.com/Liu-Tianchi/Nes2Net) | 0.689±0.0007 | 0.589±0.001 | 0.689±0.0007 | 0.634±0.0008 | 0.779±0.0007 | 0.311±0.0007 | 0.696±0.001 | | [Res2TCNGaurd](https://github.com/mtuciru/Res2TCNGuard) | 0.627±0.001 | 0.520±0.001 | 0.627±0.001 | 0.569±0.001 | 0.691±0.001 | 0.373±0.001 | 0.918±0.001 | | [ResCapsGuard](https://github.com/mtuciru/ResCapsGuard) | 0.677±0.001 | 0.575±0.001 | 0.677±0.001 | 0.622±0.001 | 0.718±0.001 | 0.323±0.001 | 0.896±0.001 | | [SLS with XLS-R](https://github.com/QiShanZhang/SLSforASVspoof-2021-DF) | 0.779±0.001 | 0.700±0.001 | 0.779±0.001 | 0.737±0.001 | 0.859±0.001 | 0.221±0.001 | 0.650±0.001 | | [Wav2Vec 2.0](https://github.com/TakHemlata/SSL_Anti-spoofing) | 0.772±0.0006 | 0.687±0.001 | 0.772±0.0006 | 0.727±0.001 | 0.850±0.0006 | 0.228±0.0006 | 0.558±0.002 | | [TCM-ADD](https://github.com/ductuantruong/tcm_add) | 0.857±0.001 | 0.797±0.001 | 0.859±0.001 | 0.827±0.001 | 0.914±0.0004 | 0.143±0.001 | 0.424±0.001 | | [Spectra-0](https://huggingface.co/MTUCI/spectra_0) | **0.962** | **0.942** | **0.962** | **0.952** | **0.985** | **0.038** | **0.124** | # Download ## Using Datasets ```python from datasets import load_dataset ds = load_dataset("MTUCI/RuASD") print(ds) ``` ## Using Datasets with streaming mode ```python from datasets import load_dataset ds = load_dataset("MTUCI/RuASD", streaming=True) small_ds = ds.take(1000) print(small_ds) ``` # Contact - **Email:** [k.n.borodin@mtuci.ru](mailto:k.n.borodin@mtuci.ru) - **Telegram channel:** [https://t.me/korallll_ai](https://t.me/korallll_ai) # Citation ``` @unpublished{ruasd2026, author = {}, title = {}, year = {} } ``` # TTS and VC models | Model | Link | | --------------------- | -------------------------------------------------------------------------- | | Espeech Podcaster | https://hf.co/ESpeech/ESpeech-TTS-1_podcaster | | Espeech RL-V1 | https://hf.co/ESpeech/ESpeech-TTS-1_RL-V1 | | Espeech RL-V2 | https://hf.co/ESpeech/ESpeech-TTS-1_RL-V1 | | Espeech SFT-95k | https://hf.co/ESpeech/ESpeech-TTS-1_SFT-95K | | Espeech SFT-256k | https://hf.co/ESpeech/ESpeech-TTS-1_SFT-256K | | F5-TTS checkpoint | https://hf.co/Misha24-10/F5-TTS_RUSSIAN | | F5-TTS checkpoint | https://hf.co/hotstone228/F5-TTS-Russian | | VITS checkpoint | https://hf.co/joefox/tts_vits_ru_hf | | PiperTTS | https://github.com/rhasspy/piper | | TeraTTS-natasha | https://hf.co/TeraTTS/natasha-g2p-vits | | TeraTTS-girl_nice | https://hf.co/TeraTTS/girl_nice-g2p-vits | | TeraTTS-glados | https://hf.co/TeraTTS/glados-g2p-vits | | TeraTTS-glados2 | https://hf.co/TeraTTS/glados2-g2p-vits | | MMS | https://hf.co/facebook/mms-tts-rus | | VITS checkpoint | https://hf.co/utrobinmv/tts_ru_free_hf_vits_low_multispeaker | | VITS checkpoint | https://hf.co/utrobinmv/tts_ru_free_hf_vits_high_multispeaker | | VITS2 checkpoint | https://hf.co/frappuccino/vits2_ru_natasha | | GPT-SoVITS checkpoint | https://hf.co/alphacep/vosk-tts-ru-gpt-sovits | | CoquiTTS | https://hf.co/coqui/XTTS-v2 | | XTTS checkpoint | https://hf.co/NeuroDonu/RU-XTTS-DonuModel | | XTTS checkpoint | https://hf.co/omogr/xtts-ru-ipa | | Fastpitch IPA | https://hf.co/bene-ges/tts_ru_ipa_fastpitch_ruslan | | Fastpitch BERT g2p | https://hf.co/bene-ges/ru_g2p_ipa_bert_large | | RussianFastPitch | https://github.com/safonovanastya/RussianFastPitch | | Bark | https://hf.co/suno/bark-small | | GradTTS | https://github.com/huawei-noah/Speech-Backbones/tree/main/Grad-TTS | | FishTTS | https://hf.co/fishaudio/fish-speech-1.5 | | Pyttsx3 | https://github.com/nateshmbhat/pyttsx3 | | RHVoice | https://github.com/RHVoice/RHVoice | | Silero | https://github.com/snakers4/silero-models | | Fairseq Transformer | https://hf.co/facebook/tts_transformer-ru-cv7_css10 | | SpeechT5 | https://hf.co/voxxer/speecht5_finetuned_commonvoice_ru_translit | | Vosk-TTS | https://github.com/alphacep/vosk-tts | | EdgeTTS | https://github.com/rany2/edge-tts | | VK Cloud | https://cloud.vk.com/ | | SaluteSpeech | https://developers.sber.ru/portal/products/smartspeech | | ElevenLabs | https://elevenlabs.io/ |

应用场景：