five

vancenceho/youtube-spotify-audio-features

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/vancenceho/youtube-spotify-audio-features
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cdla-sharing-1.0 language: - en tags: - music pretty_name: Spotify-YouTube Audio Features size_categories: - 10K<n<100K --- # Spotify–YouTube Audio Features Tabular **librosa** audio features for tracks aligned with the Spotify / YouTube pipeline in the *viral-content-predictor* project. Each row is one Spotify `track_id` matched to a downloaded YouTube audio clip; features are aggregated statistics (mean / std) computed on the decoded waveform. ## Files | File | Description | |------|-------------| | `audio_features.csv` | One row per track: `track_id`, 89 derived feature dimensions (means/stds), `extraction_success`, `error_message`. | ## Data instance - **Format:** CSV (header row). - **Rows:** on the order of **tens of thousands** of tracks (exact count may change as the extraction job is updated). - **Key column:** `track_id` — Spotify track identifier, join key to other project tables. - **Label / target:** not included; this file is **features only**. ## Feature groups (89 dimensions) Features are extracted with **librosa** from MP3s under the project’s YouTube-audio download step (`02b_youtube_audio_feature_extraction.ipynb`). Groups include: - Tempo (1) - RMS energy (2) - Zero crossing rate (2) - Spectral centroid, rolloff, bandwidth (6) - Spectral contrast — 7 bands (14) - MFCC — 13 coefficients (26) - Chroma — 12 pitch classes (24) - Tonnetz (12) - Onset strength (2) Most groups contribute **mean** and **std** columns; see the CSV header for exact names. Additional columns: - **`extraction_success`** — boolean-like flag indicating whether feature extraction completed for that file. - **`error_message`** — empty or diagnostic text when extraction failed. ## Provenance - **Audio source:** user-downloaded YouTube audio (`.mp3`) matched to Spotify metadata elsewhere in the pipeline. - **Processing:** Python, **librosa**; optional multiprocessing for throughput. - **This artifact:** not the raw audio; only numeric summaries suitable for modeling and joins. ## Usage ```python import pandas as pd df = pd.read_csv("audio_features.csv") # Join on track_id with Spotify / YouTube tables in your project ``` ## Limitations - Coverage is limited to tracks with a successful YouTube match and a readable audio file. - Feature definitions follow librosa defaults; hyperparameters (e.g. hop length, FFT size) are fixed in the extraction notebook. ## License This dataset card specifies **CDLA-Sharing-1.0**. Ensure your redistribution of the CSV and any derived data complies with that license and with the licenses of the underlying **Spotify** and **YouTube** data you used to build the file.
提供机构:
vancenceho
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作