five

moonscape-software/Synthetic_Speech_Atlas_Public

收藏
Hugging Face2026-03-31 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/moonscape-software/Synthetic_Speech_Atlas_Public
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-sa-4.0 task_categories: - audio-classification - feature-extraction language: - en - ja tags: - speech - deepfake-detection - anti-spoofing - acoustic-features - audio-forensics - synthetic-speech - neural-vocoder - wavefake pretty_name: Moonscape SSA — WaveFake v1.2 Acoustic Features size_categories: - 100K<n<1M --- # Moonscape SSA — WaveFake v1.2 Acoustic Features **Moonscape Software — Synthetic Speech Atlas Project** Pre-extracted acoustic feature matrix for WaveFake v1.2, derived from the `df_arena_v1` pipeline (Parselmouth/Praat + Brouhaha + custom physics-motivated features). Part of the **Moonscape Synthetic Speech Atlas (SSA)** — a curated corpus for classical speech deepfake detection research. Audio files are **not included**. This file contains 57 hand-crafted signal-processing features per clip, ready for use with LightGBM, XGBoost, logistic regression, and similar lightweight classifiers. This file is released under **CC-BY-SA-4.0**, consistent with the WaveFake v1.2 source licence. Derivative works must carry the same licence. --- ## File | File | Rows | Bonafide | Spoof | Licence | |------|------|----------|-------|---------| | `ssa_wavefake_features_v2.parquet` | ~134,266 | 0 | ~134,266 | CC-BY-SA-4.0 | ### Why Spoof-Only? WaveFake v1.2 is built on LJSpeech (bonafide) and JSUT (bonafide) source audio. The LJSpeech bonafide clips are a **single speaker** with no speaker-diversity metadata suitable for K-anonymisation. Including them alongside vocoder-processed versions in the same export would create an implicit re-identification pathway (feature vectors sufficiently close to the single known speaker). The bonafide source is therefore excluded from this export by design. If you need bonafide reference audio for LJSpeech, use the original LJSpeech dataset directly (public domain). --- ## Vocoders Covered WaveFake v1.2 covers 9 neural vocoder architectures across two source corpora: | Vocoder | Source Corpus | Architecture Family | |---------|--------------|-------------------| | `melgan` | LJSpeech | MelGAN | | `full_band_melgan` | LJSpeech | MelGAN (full-band) | | `multi_band_melgan` | LJSpeech | MelGAN (multi-band) | | `parallel_wavegan` | LJSpeech | Parallel WaveGAN | | `hifiGAN` | LJSpeech | HiFi-GAN | | `waveglow` | LJSpeech | WaveGlow (flow-based) | | `wavegrad` | LJSpeech | WaveGrad (diffusion) | | `melgan_large` | JSUT | MelGAN (large) | | `full_band_melgan_large` | JSUT | MelGAN full-band (large) | The `vocoder` column in the parquet identifies the architecture for each row. **Notable finding — JSUT mora-timing:** JSUT-based vocoders show `npvi` ≈ 38 (Japanese mora-timed rhythm), compared to LJSpeech-based vocoders at `npvi` ≈ 55 (English stress-timed). This is because neural vocoders reconstruct waveforms from source mel spectrograms — they inherit the rhythm of the source audio rather than generating it from text. This makes JSUT vocoders an important control case for nPVI-based detection research. --- ## Feature Schema ### Metadata Columns | Column | Type | Description | |--------|------|-------------| | `anon_id` | int | Anonymous sequential ID (post-shuffle) | | `label` | string | Always `spoof` in this export | | `vocoder` | string | Neural vocoder architecture name | | `tier` | string | Brouhaha quality tier (1=studio, 2=near-field) | | `brouhaha_graded` | int | Always 1 — full Brouhaha coverage | | `source_licence` | string | `CC-BY-SA-4.0` | | `source_dataset` | string | `WaveFake` | | `duration_ms` | float32 | Clip duration (ms) | | `duration_s` | float32 | Clip duration (seconds) | ### Tier 1 — Standard DSP Features (float32, FP16 watermarked) 37 standard acoustic features spanning prosody, voice quality, formants, spectral texture, and physics-motivated measures. All values are `round(2dp)` then reinflated to FP16 resolution via HMAC-SHA256 seeded deterministic noise (IP watermark — see Watermark section below). | Column | Units | Description | |--------|-------|-------------| | `snr_median` | dB | Median SNR (Brouhaha) | | `snr_mean` | dB | Mean SNR (Brouhaha) | | `c50_median` | dB | Median room clarity C50 | | `speech_ratio` | 0–1 | Active speech proportion | | `pitch_mean` | Hz | Mean F0 (voiced frames) | | `pitch_std` | Hz | F0 standard deviation | | `pitch_range` | Hz | F0 max–min range | | `npvi` | — | Normalised Pairwise Variability Index (rhythm) | | `intensity_mean` | dB | Mean RMS intensity | | `intensity_max` | dB | Peak intensity | | `intensity_range` | dB | Dynamic range (peak – minimum) | | `intensity_velocity_max` | dB/frame | Max rate of intensity change | | `jitter_local` | % | Cycle-to-cycle period perturbation | | `shimmer_local` | % | Cycle-to-cycle amplitude perturbation | | `hnr_mean` | dB | Harmonics-to-noise ratio | | `cpps` | dB | Cepstral peak prominence, smoothed | | `hnr_c50_ratio` | — | HNR adjusted for room acoustics | | `cpps_snr_ratio` | — | CPPS normalised for noise floor | | `spectral_centroid_mean` | Hz | Mean spectral brightness | | `spectral_tilt` | — | HF vs LF energy slope | | `mfcc_delta_mean` | — | Mean first-order MFCC delta | | `mfcc_high_variance` | — | Upper MFCC band variance (bands 12–20) | | `zcr_mean` | — | Mean zero-crossing rate | | `teo_mean` | — | Mean Teager-Kaiser Energy Operator | | `teo_std` | — | TEO temporal standard deviation | | `f1_mean` | Hz | Mean first formant | | `f2_mean` | Hz | Mean second formant | | `f3_mean` | Hz | Mean third formant | | `formant_dispersion` | Hz | F3–F1 vocal tract length proxy | | `articulation_rate` | syl/s | Estimated syllables per second | | `phoneme_count` | — | Estimated phoneme count | | `emotion_score` | 0–1 | Affective charge heuristic | | `spectral_7k8k_entropy` | bits | 7–8kHz entropy; NaN = codec gate | | `fam_75hz_sharpness` | — | Acoustic mode sharpness at 75Hz | | `fam_86hz_sharpness` | — | Acoustic mode sharpness at 86Hz | | `drr_hf_lf_slope_ratio` | — | Direct-to-reverberant HF/LF slope | ### Tier 2 — Biomechanical Features (Z-score only, raw permanently dropped) 11 proprietary physics-motivated features published as Z-scores against the bonafide baseline only. Raw values are dropped at export — they are not available under any access tier. See the gated SSA corpus for context on why these are obfuscated. | Column | Description | |--------|-------------| | `bico_f0_f1_z` | Bicoherence F0–F1 phase coupling Z-score | | `bico_f1_f2_z` | Bicoherence F1–F2 phase coupling Z-score | | `modgd_var_z` | Modified group delay variance Z-score | | `pgv_magnitude_correlation_z` | Phase group velocity correlation Z-score | | `pgv_total_z` | Total phase group velocity energy Z-score | | `f1_velocity_z` | F1 transition rate Z-score | | `f2_velocity_z` | F2 transition rate Z-score | | `inertial_decay_residual_z` | Biomechanical inertia decay residual Z-score | | `teo_std_high_z` | TEO high-band standard deviation Z-score | | `teo_std_low_z` | TEO low-band standard deviation Z-score | | `pitch_velocity_max_z` | Max F0 rate-of-change Z-score | Z-scores are computed against the bonafide population baseline. Since this export is spoof-only, the Z-score reference baseline is drawn from the bonafide population in the full SSA corpus (LibriSeVoc/ITW/FoR/SONAR combined), not from within this file. --- ## Provenance Watermark All float features carry a deterministic FP16-resolution IP watermark: ``` seed = HMAC-SHA256(SSA_EXPORT_SECRET, col_name + "|SSA_v2_2026") noise ~ Uniform(-0.004, +0.004) seeded by seed value = float16(round(raw, 2) + noise) ``` The watermark is reproducible and verifiable. Any extracted subset is traceable to this export version. Do not attempt to remove or alter the watermark — this is a condition of use. --- ## Usage Example ```python import pandas as pd df = pd.read_parquet('ssa_wavefake_features_v2.parquet') print(df.shape) # (~134266, 57) print(df['vocoder'].value_counts()) # breakdown by architecture print(df['npvi'].describe()) # LJSpeech ~55, JSUT ~38 # Split by source corpus via vocoder name jsut_vocoders = {'melgan_large', 'full_band_melgan_large'} df_jsut = df[df['vocoder'].isin(jsut_vocoders)] df_ljspeech = df[~df['vocoder'].isin(jsut_vocoders)] print(f"JSUT nPVI mean: {df_jsut['npvi'].mean():.1f}") print(f"LJSpeech nPVI mean: {df_ljspeech['npvi'].mean():.1f}") # Tier 2 Z-score features tier2 = [c for c in df.columns if c.endswith('_z')] print(df[tier2].describe()) ``` --- ## Licence & Attribution This file is released under **CC-BY-SA 4.0**, consistent with the WaveFake v1.2 source dataset licence. You are free to share and adapt this material for any purpose, including commercial, under the following terms: - **Attribution** — Credit Frank & Schönherr (2021) for WaveFake, and Moonscape Software for the SSA feature extraction pipeline. - **ShareAlike** — Derivatives must carry the same CC-BY-SA-4.0 licence. Feature extraction code and Tier 2 biomechanical feature methodology are copyright Moonscape Software. The ShareAlike obligation applies to the dataset, not the extraction software. --- ## Citation ```bibtex @dataset{kleingertner2026ssa_wavefake, author = {Kleingertner, Chris}, title = {Moonscape SSA — WaveFake v1.2 Acoustic Features}, year = {2026}, publisher = {Moonscape Software}, url = {https://huggingface.co/datasets/moonscape-software/Synthetic_Speech_Atlas}, license = {CC-BY-SA-4.0} } @inproceedings{frank2021wavefake, title = {WaveFake: A Dataset to Facilitate Audio Deepfake Detection}, author = {Frank, Joel and Schönherr, Lea}, booktitle = {NeurIPS Datasets and Benchmarks}, year = {2021} } @inproceedings{lavechin2022brouhaha, title = {Brouhaha: Multi-task Training for Noise Speech Detection and Assessment}, author = {Lavechin, Marvin and others}, booktitle = {Interspeech}, year = {2022} } ``` --- *Moonscape Software — Synthetic Speech Atlas* *Export version: SSA_v2_2026 | Watermark: HMAC-SHA256 seeded FP16 noise*
提供机构:
moonscape-software
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作