Prhokbvf556/Audio-VAE-Phonk-Dataset

Name: Prhokbvf556/Audio-VAE-Phonk-Dataset
Creator: Prhokbvf556
Published: 2026-04-02 18:17:21
License: 暂无描述

Hugging Face2026-04-02 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/Prhokbvf556/Audio-VAE-Phonk-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - audio-classification size_categories: - 100K<n<1M --- # 🚗 Phonk Audio Dataset for Generative ML This dataset contains hundreds of hours of high-quality Phonk music (Drift Phonk, Hard Phonk, etc.), specifically scraped, pre-processed, and formatted for training deep learning audio models. It is perfectly suited for training **Audio VAEs, EnCodec, TiTok, or Audio Diffusion / Transformer Prior models** from scratch. ## 📊 Dataset Specifications The audio data has been heavily pre-processed to maximize training efficiency on TPUs/GPUs: * **Format:** `TFRecord` * **Sample Rate:** `32,000 Hz` (Optimized for generative ML, capturing frequencies up to 16kHz) * **Channels:** `1` (Mono) * **Chunk Size:** `131,072 samples` per chunk (Exactly **~4.09 seconds** of audio) * **Data Type:** Raw Waveform (`float32` arrays) * **Quality Control:** - Strict RMS-based silence removal. - MD5 hashing for chunk deduplication (no overlapping repeated segments). ## 🛠️ Preprocessing Pipeline The dataset was constructed using a high-throughput multi-core pipeline: 1. **Source:** YouTube Phonk/Drift mixes and playlists. 2. **Download & Extraction:** `yt-dlp` (bestaudio) -> `ffmpeg` (conversion to 32kHz, Mono, s16). 3. **Slicing:** Audio is loaded into memory, sliced into exact `2^17` (131,072) sample chunks. 4. **Filtering:** Chunks with RMS energy below 0.01 are discarded. 5. **Serialization:** Saved as `tf.train.Example` directly into `TFRecord` shards. ## 💻 How to use (TensorFlow) Since the data is stored in TFRecords, you can stream it directly into your training loop without downloading the entire dataset, which is ideal for Kaggle/Colab environments. ```python import tensorflow as tf def parse_tfrecord_fn(example): feature_description = { "audio": tf.io.FixedLenFeature([131072], tf.float32), } example = tf.io.parse_single_example(example, feature_description) return example["audio"] # Load dataset (can point directly to HF paths or local /dev/shm) raw_dataset = tf.data.TFRecordDataset([ "data/audio_vae_part_0001.tfrecord", "data/audio_vae_part_0002.tfrecord" ]) parsed_dataset = raw_dataset.map(parse_tfrecord_fn, num_parallel_calls=tf.data.AUTOTUNE) parsed_dataset = parsed_dataset.batch(32).prefetch(tf.data.AUTOTUNE) for audio_batch in parsed_dataset.take(1): print(audio_batch.shape) # Expected output: (32, 131072) ``` ⚠️ Intended Use & Limitations This dataset is designed for research in music generation architectures. Due to the aggressive lossy compression of the source material (YouTube Opus/AAC) and the 32kHz downsampling, it is intended for Lo-Fi / Phonk style generation where extreme high-fidelity high-end frequencies (>16kHz) are not required.

提供机构：

Prhokbvf556

5,000+

优质数据集

54 个

任务类型

进入经典数据集