moonscape-software/Synthetic_Speech_Atlas_Public
收藏Hugging Face2026-03-31 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/moonscape-software/Synthetic_Speech_Atlas_Public
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-4.0
task_categories:
- audio-classification
- feature-extraction
language:
- en
- ja
tags:
- speech
- deepfake-detection
- anti-spoofing
- acoustic-features
- audio-forensics
- synthetic-speech
- neural-vocoder
- wavefake
pretty_name: Moonscape SSA — WaveFake v1.2 Acoustic Features
size_categories:
- 100K<n<1M
---
# Moonscape SSA — WaveFake v1.2 Acoustic Features
**Moonscape Software — Synthetic Speech Atlas Project**
Pre-extracted acoustic feature matrix for WaveFake v1.2, derived from the
`df_arena_v1` pipeline (Parselmouth/Praat + Brouhaha + custom physics-motivated
features). Part of the **Moonscape Synthetic Speech Atlas (SSA)** — a curated
corpus for classical speech deepfake detection research.
Audio files are **not included**. This file contains 57 hand-crafted
signal-processing features per clip, ready for use with LightGBM, XGBoost,
logistic regression, and similar lightweight classifiers.
This file is released under **CC-BY-SA-4.0**, consistent with the WaveFake v1.2
source licence. Derivative works must carry the same licence.
---
## File
| File | Rows | Bonafide | Spoof | Licence |
|------|------|----------|-------|---------|
| `ssa_wavefake_features_v2.parquet` | ~134,266 | 0 | ~134,266 | CC-BY-SA-4.0 |
### Why Spoof-Only?
WaveFake v1.2 is built on LJSpeech (bonafide) and JSUT (bonafide) source audio.
The LJSpeech bonafide clips are a **single speaker** with no speaker-diversity
metadata suitable for K-anonymisation. Including them alongside vocoder-processed
versions in the same export would create an implicit re-identification pathway
(feature vectors sufficiently close to the single known speaker). The bonafide
source is therefore excluded from this export by design.
If you need bonafide reference audio for LJSpeech, use the original LJSpeech
dataset directly (public domain).
---
## Vocoders Covered
WaveFake v1.2 covers 9 neural vocoder architectures across two source corpora:
| Vocoder | Source Corpus | Architecture Family |
|---------|--------------|-------------------|
| `melgan` | LJSpeech | MelGAN |
| `full_band_melgan` | LJSpeech | MelGAN (full-band) |
| `multi_band_melgan` | LJSpeech | MelGAN (multi-band) |
| `parallel_wavegan` | LJSpeech | Parallel WaveGAN |
| `hifiGAN` | LJSpeech | HiFi-GAN |
| `waveglow` | LJSpeech | WaveGlow (flow-based) |
| `wavegrad` | LJSpeech | WaveGrad (diffusion) |
| `melgan_large` | JSUT | MelGAN (large) |
| `full_band_melgan_large` | JSUT | MelGAN full-band (large) |
The `vocoder` column in the parquet identifies the architecture for each row.
**Notable finding — JSUT mora-timing:**
JSUT-based vocoders show `npvi` ≈ 38 (Japanese mora-timed rhythm), compared to
LJSpeech-based vocoders at `npvi` ≈ 55 (English stress-timed). This is because
neural vocoders reconstruct waveforms from source mel spectrograms — they inherit
the rhythm of the source audio rather than generating it from text. This makes
JSUT vocoders an important control case for nPVI-based detection research.
---
## Feature Schema
### Metadata Columns
| Column | Type | Description |
|--------|------|-------------|
| `anon_id` | int | Anonymous sequential ID (post-shuffle) |
| `label` | string | Always `spoof` in this export |
| `vocoder` | string | Neural vocoder architecture name |
| `tier` | string | Brouhaha quality tier (1=studio, 2=near-field) |
| `brouhaha_graded` | int | Always 1 — full Brouhaha coverage |
| `source_licence` | string | `CC-BY-SA-4.0` |
| `source_dataset` | string | `WaveFake` |
| `duration_ms` | float32 | Clip duration (ms) |
| `duration_s` | float32 | Clip duration (seconds) |
### Tier 1 — Standard DSP Features (float32, FP16 watermarked)
37 standard acoustic features spanning prosody, voice quality, formants,
spectral texture, and physics-motivated measures. All values are `round(2dp)`
then reinflated to FP16 resolution via HMAC-SHA256 seeded deterministic noise
(IP watermark — see Watermark section below).
| Column | Units | Description |
|--------|-------|-------------|
| `snr_median` | dB | Median SNR (Brouhaha) |
| `snr_mean` | dB | Mean SNR (Brouhaha) |
| `c50_median` | dB | Median room clarity C50 |
| `speech_ratio` | 0–1 | Active speech proportion |
| `pitch_mean` | Hz | Mean F0 (voiced frames) |
| `pitch_std` | Hz | F0 standard deviation |
| `pitch_range` | Hz | F0 max–min range |
| `npvi` | — | Normalised Pairwise Variability Index (rhythm) |
| `intensity_mean` | dB | Mean RMS intensity |
| `intensity_max` | dB | Peak intensity |
| `intensity_range` | dB | Dynamic range (peak – minimum) |
| `intensity_velocity_max` | dB/frame | Max rate of intensity change |
| `jitter_local` | % | Cycle-to-cycle period perturbation |
| `shimmer_local` | % | Cycle-to-cycle amplitude perturbation |
| `hnr_mean` | dB | Harmonics-to-noise ratio |
| `cpps` | dB | Cepstral peak prominence, smoothed |
| `hnr_c50_ratio` | — | HNR adjusted for room acoustics |
| `cpps_snr_ratio` | — | CPPS normalised for noise floor |
| `spectral_centroid_mean` | Hz | Mean spectral brightness |
| `spectral_tilt` | — | HF vs LF energy slope |
| `mfcc_delta_mean` | — | Mean first-order MFCC delta |
| `mfcc_high_variance` | — | Upper MFCC band variance (bands 12–20) |
| `zcr_mean` | — | Mean zero-crossing rate |
| `teo_mean` | — | Mean Teager-Kaiser Energy Operator |
| `teo_std` | — | TEO temporal standard deviation |
| `f1_mean` | Hz | Mean first formant |
| `f2_mean` | Hz | Mean second formant |
| `f3_mean` | Hz | Mean third formant |
| `formant_dispersion` | Hz | F3–F1 vocal tract length proxy |
| `articulation_rate` | syl/s | Estimated syllables per second |
| `phoneme_count` | — | Estimated phoneme count |
| `emotion_score` | 0–1 | Affective charge heuristic |
| `spectral_7k8k_entropy` | bits | 7–8kHz entropy; NaN = codec gate |
| `fam_75hz_sharpness` | — | Acoustic mode sharpness at 75Hz |
| `fam_86hz_sharpness` | — | Acoustic mode sharpness at 86Hz |
| `drr_hf_lf_slope_ratio` | — | Direct-to-reverberant HF/LF slope |
### Tier 2 — Biomechanical Features (Z-score only, raw permanently dropped)
11 proprietary physics-motivated features published as Z-scores against the
bonafide baseline only. Raw values are dropped at export — they are not
available under any access tier. See the gated SSA corpus for context on
why these are obfuscated.
| Column | Description |
|--------|-------------|
| `bico_f0_f1_z` | Bicoherence F0–F1 phase coupling Z-score |
| `bico_f1_f2_z` | Bicoherence F1–F2 phase coupling Z-score |
| `modgd_var_z` | Modified group delay variance Z-score |
| `pgv_magnitude_correlation_z` | Phase group velocity correlation Z-score |
| `pgv_total_z` | Total phase group velocity energy Z-score |
| `f1_velocity_z` | F1 transition rate Z-score |
| `f2_velocity_z` | F2 transition rate Z-score |
| `inertial_decay_residual_z` | Biomechanical inertia decay residual Z-score |
| `teo_std_high_z` | TEO high-band standard deviation Z-score |
| `teo_std_low_z` | TEO low-band standard deviation Z-score |
| `pitch_velocity_max_z` | Max F0 rate-of-change Z-score |
Z-scores are computed against the bonafide population baseline. Since this
export is spoof-only, the Z-score reference baseline is drawn from the bonafide
population in the full SSA corpus (LibriSeVoc/ITW/FoR/SONAR combined), not from
within this file.
---
## Provenance Watermark
All float features carry a deterministic FP16-resolution IP watermark:
```
seed = HMAC-SHA256(SSA_EXPORT_SECRET, col_name + "|SSA_v2_2026")
noise ~ Uniform(-0.004, +0.004) seeded by seed
value = float16(round(raw, 2) + noise)
```
The watermark is reproducible and verifiable. Any extracted subset is traceable
to this export version. Do not attempt to remove or alter the watermark — this
is a condition of use.
---
## Usage Example
```python
import pandas as pd
df = pd.read_parquet('ssa_wavefake_features_v2.parquet')
print(df.shape) # (~134266, 57)
print(df['vocoder'].value_counts()) # breakdown by architecture
print(df['npvi'].describe()) # LJSpeech ~55, JSUT ~38
# Split by source corpus via vocoder name
jsut_vocoders = {'melgan_large', 'full_band_melgan_large'}
df_jsut = df[df['vocoder'].isin(jsut_vocoders)]
df_ljspeech = df[~df['vocoder'].isin(jsut_vocoders)]
print(f"JSUT nPVI mean: {df_jsut['npvi'].mean():.1f}")
print(f"LJSpeech nPVI mean: {df_ljspeech['npvi'].mean():.1f}")
# Tier 2 Z-score features
tier2 = [c for c in df.columns if c.endswith('_z')]
print(df[tier2].describe())
```
---
## Licence & Attribution
This file is released under **CC-BY-SA 4.0**, consistent with the WaveFake v1.2
source dataset licence. You are free to share and adapt this material for any
purpose, including commercial, under the following terms:
- **Attribution** — Credit Frank & Schönherr (2021) for WaveFake, and
Moonscape Software for the SSA feature extraction pipeline.
- **ShareAlike** — Derivatives must carry the same CC-BY-SA-4.0 licence.
Feature extraction code and Tier 2 biomechanical feature methodology are
copyright Moonscape Software. The ShareAlike obligation applies to the dataset,
not the extraction software.
---
## Citation
```bibtex
@dataset{kleingertner2026ssa_wavefake,
author = {Kleingertner, Chris},
title = {Moonscape SSA — WaveFake v1.2 Acoustic Features},
year = {2026},
publisher = {Moonscape Software},
url = {https://huggingface.co/datasets/moonscape-software/Synthetic_Speech_Atlas},
license = {CC-BY-SA-4.0}
}
@inproceedings{frank2021wavefake,
title = {WaveFake: A Dataset to Facilitate Audio Deepfake Detection},
author = {Frank, Joel and Schönherr, Lea},
booktitle = {NeurIPS Datasets and Benchmarks},
year = {2021}
}
@inproceedings{lavechin2022brouhaha,
title = {Brouhaha: Multi-task Training for Noise Speech Detection and Assessment},
author = {Lavechin, Marvin and others},
booktitle = {Interspeech},
year = {2022}
}
```
---
*Moonscape Software — Synthetic Speech Atlas*
*Export version: SSA_v2_2026 | Watermark: HMAC-SHA256 seeded FP16 noise*
提供机构:
moonscape-software



