five

xuyuefan111/audioform_dataset

收藏
Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/xuyuefan111/audioform_dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - image-to-text - text-to-image - audio-classification - image-classification - tabular-classification tags: - audio - image - multimodal - visualization - audio-visualization - 3d-visualization - synthetic - proof-of-concept - frequency-estimation - generative-audio - music-visualization --- [![Website](https://img.shields.io/badge/webXOS.netlify.app-Explore_Apps-00d4aa?style=for-the-badge&logo=netlify&logoColor=white)](https://webxos.netlify.app) [![GitHub](https://img.shields.io/badge/GitHub-webxos/webxos-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/webxos/webxos) [![Hugging Face](https://img.shields.io/badge/Hugging_Face-🤗_webxos-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white)](https://huggingface.co/webxos) [![Follow on X](https://img.shields.io/badge/Follow_@webxos-1DA1F2?style=for-the-badge&logo=x&logoColor=white)](https://x.com/webxos) <div style=" background: #00FF00; border-left: 3px solid #00FF00; padding: 12px; margin: 10px 0; font-family: 'Fira Code', 'Courier New', monospace; color: #00FF00; border-radius: 0 4px 4px 0; font-size: 8px; line-height: 1.1; max-width: 100%; overflow: hidden; "> <pre style=" font-size: 6px; line-height: 1.1; margin: 0; padding: 0; color: #00FF00; letter-spacing: 0px; word-spacing: 0px; "> AAA UUUUUUUU UUUUUUUUDDDDDDDDDDDDD IIIIIIIIII OOOOOOOOO FFFFFFFFFFFFFFFFFFFFFF OOOOOOOOO RRRRRRRRRRRRRRRRR MMMMMMMM MMMMMMMM A:::A U::::::U U::::::UD::::::::::::DDD I::::::::I OO:::::::::OO F::::::::::::::::::::F OO:::::::::OO R::::::::::::::::R M:::::::M M:::::::M A:::::A U::::::U U::::::UD:::::::::::::::DD I::::::::I OO:::::::::::::OO F::::::::::::::::::::F OO:::::::::::::OO R::::::RRRRRR:::::R M::::::::M M::::::::M A:::::::A UU:::::U U:::::UUDDD:::::DDDDD:::::DII::::::IIO:::::::OOO:::::::OFF::::::FFFFFFFFF::::FO:::::::OOO:::::::ORR:::::R R:::::RM:::::::::M M:::::::::M A:::::::::A U:::::U U:::::U D:::::D D:::::D I::::I O::::::O O::::::O F:::::F FFFFFFO::::::O O::::::O R::::R R:::::RM::::::::::M M::::::::::M A:::::A:::::A U:::::D D:::::U D:::::D D:::::DI::::I O:::::O O:::::O F:::::F O:::::O O:::::O R::::R R:::::RM:::::::::::M M:::::::::::M A:::::A A:::::A U:::::D D:::::U D:::::D D:::::DI::::I O:::::O O:::::O F::::::FFFFFFFFFF O:::::O O:::::O R::::RRRRRR:::::R M:::::::M::::M M::::M:::::::M A:::::A A:::::A U:::::D D:::::U D:::::D D:::::DI::::I O:::::O O:::::O F:::::::::::::::F O:::::O O:::::O R:::::::::::::RR M::::::M M::::M M::::M M::::::M A:::::A A:::::A U:::::D D:::::U D:::::D D:::::DI::::I O:::::O O:::::O F:::::::::::::::F O:::::O O:::::O R::::RRRRRR:::::R M::::::M M::::M::::M M::::::M A:::::AAAAAAAAA:::::A U:::::D D:::::U D:::::D D:::::DI::::I O:::::O O:::::O F::::::FFFFFFFFFF O:::::O O:::::O R::::R R:::::RM::::::M M:::::::M M::::::M A:::::::::::::::::::::A U:::::D D:::::U D:::::D D:::::DI::::I O:::::O O:::::O F:::::F O:::::O O:::::O R::::R R:::::RM::::::M M:::::M M::::::M A:::::AAAAAAAAAAAAA:::::A U::::::U U::::::U D:::::D D:::::D I::::I O::::::O O::::::O F:::::F O::::::O O::::::O R::::R R:::::RM::::::M MMMMM M::::::M A:::::A A:::::AU:::::::UUU:::::::U DDD:::::DDDDD:::::DII::::::IIO:::::::OOO:::::::OFF:::::::FF O:::::::OOO:::::::ORR:::::R R:::::RM::::::M M::::::M A:::::A A:::::AUU:::::::::::::UU D:::::::::::::::DD I::::::::I OO:::::::::::::OO F::::::::FF OO:::::::::::::OO R::::::R R:::::RM::::::M M::::::M A:::::A A:::::A UU:::::::::UU D::::::::::::DDD I::::::::I OO:::::::::OO F::::::::FF OO:::::::::OO R::::::R R:::::RM::::::M M::::::M AAAAAAA AAAAAAA UUUUUUUUU DDDDDDDDDDDDD IIIIIIIIII OOOOOOOOO FFFFFFFFFFF OOOOOOOOO RRRRRRRR RRRRRRRMMMMMMMM MMMMMMMM </pre> </div> ## Audioform_Dataset_v1 This dataset is the very first output from **AUDIOFORM** — a Three.js powered 3D audio visualization tool that turns audio files into beautiful, timestamped visual frames with rich metadata. **AUDIOFORM** by webXOS is available for download in the /audioform/ folder of this repo so developers can create their own similar datasets. Audio for is a synthetic harmonic oscilator that runs in HTML, think of it as the "Hello World" / MNIST-style dataset application for audio-to-visual multimodal machine learning. This dataset contains **10 captured frames** from a short uploaded WAV file (played at 1× speed), together with per-frame metadata including dominant frequency, timestamp, and capture info. ## Dataset Structure ``` audioform_dataset/ ├── images/ │ ├── frame_0001.png │ ├── frame_0002.png │ └── ... (10 PNG frames total) ├── metadata.csv # Main metadata file (Hugging Face viewer uses this) └── README.md ``` ``` | Column | Type | Description | Example Value | |---------------|---------|-----------------------------------------------------------------------------|-----------------------------------| | `file_name` | string | Relative path to the visualization PNG (required by Hugging Face) | `images/frame_0001.png` | | `frame_id` | int | Sequential frame number (0-based) | 0, 1, 2, …, 9 | | `timestamp` | float | Time in seconds when the frame was captured from the audio | 5.365, 6.219, 9.504 | | `frequency` | int | Dominant / main detected audio frequency at capture time (Hz) | 0 (in this tiny sample) | | `time_scale` | int | Playback speed multiplier used during visualization | 1 | | `capture_date`| string | UTC ISO timestamp when the frame was rendered | 2026-01-13T19:57:36.427Z | ``` See how fast a tiny diffusion model / GAN / LoRA can memorize & regenerate these exact 10 styles. Use the frames as style references for ControlNet, IP-Adapter, or fine-tuning SD to adopt this neon 3D audio-viz aesthetic. ``` This dataset shows the **format** AUDIOFORM produces. → Feed it real music, voices, field recordings, synths → Generate 1k–100k+ frames → Add labels (genre, instrument, mood, multiple freq peaks…) → Unlock serious applications: - Music video auto-generation - Visual audio classifiers - Audio-conditioned image/video generation - Interactive music → 3D art installations - Novel multimodal music understanding models ``` ## Dataset Description This dataset was generated using AUDIOFORM, a 3D audio visualization system. - **Total Frames**: 10 - **Generation Date**: 2026-01-13 - **Audio Type**: Uploaded WAV File - **Time Scaling**: 1x ## Dataset Structure - `images/`: Contains all captured frames in PNG format - `metadata.csv`: Contains classification data for each frame ## Metadata Columns - `file_name`: Relative path to the image file (e.g., images/frame_0001.png) - **REQUIRED for Hugging Face** - `frame_id`: Unique identifier for each frame - `timestamp`: Time in seconds when frame was captured - `frequency`: Audio frequency at capture time (Hz) - `time_scale`: Playback speed multiplier - `capture_date`: ISO date string of capture ## Intended Use This dataset is intended for training machine learning models on audio visualization patterns, waveform classification, or generative AI tasks.
提供机构:
xuyuefan111
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作