Prhokbvf556/Audio-VAE-Phonk-Dataset
收藏Hugging Face2026-04-02 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Prhokbvf556/Audio-VAE-Phonk-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- audio-classification
size_categories:
- 100K<n<1M
---
# 🚗 Phonk Audio Dataset for Generative ML
This dataset contains hundreds of hours of high-quality Phonk music (Drift Phonk, Hard Phonk, etc.), specifically scraped, pre-processed, and formatted for training deep learning audio models.
It is perfectly suited for training **Audio VAEs, EnCodec, TiTok, or Audio Diffusion / Transformer Prior models** from scratch.
## 📊 Dataset Specifications
The audio data has been heavily pre-processed to maximize training efficiency on TPUs/GPUs:
* **Format:** `TFRecord`
* **Sample Rate:** `32,000 Hz` (Optimized for generative ML, capturing frequencies up to 16kHz)
* **Channels:** `1` (Mono)
* **Chunk Size:** `131,072 samples` per chunk (Exactly **~4.09 seconds** of audio)
* **Data Type:** Raw Waveform (`float32` arrays)
* **Quality Control:**
- Strict RMS-based silence removal.
- MD5 hashing for chunk deduplication (no overlapping repeated segments).
## 🛠️ Preprocessing Pipeline
The dataset was constructed using a high-throughput multi-core pipeline:
1. **Source:** YouTube Phonk/Drift mixes and playlists.
2. **Download & Extraction:** `yt-dlp` (bestaudio) -> `ffmpeg` (conversion to 32kHz, Mono, s16).
3. **Slicing:** Audio is loaded into memory, sliced into exact `2^17` (131,072) sample chunks.
4. **Filtering:** Chunks with RMS energy below 0.01 are discarded.
5. **Serialization:** Saved as `tf.train.Example` directly into `TFRecord` shards.
## 💻 How to use (TensorFlow)
Since the data is stored in TFRecords, you can stream it directly into your training loop without downloading the entire dataset, which is ideal for Kaggle/Colab environments.
```python
import tensorflow as tf
def parse_tfrecord_fn(example):
feature_description = {
"audio": tf.io.FixedLenFeature([131072], tf.float32),
}
example = tf.io.parse_single_example(example, feature_description)
return example["audio"]
# Load dataset (can point directly to HF paths or local /dev/shm)
raw_dataset = tf.data.TFRecordDataset([
"data/audio_vae_part_0001.tfrecord",
"data/audio_vae_part_0002.tfrecord"
])
parsed_dataset = raw_dataset.map(parse_tfrecord_fn, num_parallel_calls=tf.data.AUTOTUNE)
parsed_dataset = parsed_dataset.batch(32).prefetch(tf.data.AUTOTUNE)
for audio_batch in parsed_dataset.take(1):
print(audio_batch.shape)
# Expected output: (32, 131072)
```
⚠️ Intended Use & Limitations
This dataset is designed for research in music generation architectures. Due to the aggressive lossy compression of the source material (YouTube Opus/AAC) and the 32kHz downsampling, it is intended for Lo-Fi / Phonk style generation where extreme high-fidelity high-end frequencies (>16kHz) are not required.
提供机构:
Prhokbvf556



