five

IamXiangyu/osu-beatmaps-duplicated

收藏
Hugging Face2026-03-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/IamXiangyu/osu-beatmaps-duplicated
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: other license_name: osu-terms license_link: https://osu.ppy.sh/legal/terms task_categories: - audio-classification tags: - music - rhythm-game - beatmaps - osu size_categories: - 10K<n<100K configs: - config_name: original data_files: - split: train path: original/*.tar - config_name: compressed data_files: - split: train path: compressed/*.tar --- # osu! Beatmaps Dataset (WebDataset) A collection of ranked/loved osu! beatmaps with audio and chart data, in WebDataset format. ## Dataset Variants | Variant | Audio Format | Description | |---------|--------------|-------------| | `original` | MP3/OGG/WAV | Full quality original audio files | | `compressed` | 64kbps Mono Opus | Compressed audio for smaller download | ```python from datasets import load_dataset # Load original audio variant ds = load_dataset("project-riz/osu-beatmaps", "original", streaming=True) # Load compressed audio variant ds = load_dataset("project-riz/osu-beatmaps", "compressed", streaming=True) ``` ## Statistics | Metric | Value | |--------|-------| | Total beatmaps | 213,068 | | Unique audio tracks (samples) | 46,386 | | Total audio duration | 2,526 hours | | Total beatmap playable length | 8,419 hours | | Date range | 2007-10-06 to 2025-12-31 | ## Sample Structure Each sample represents one unique audio track with all its associated beatmaps: ```plain {seq_num:06d}.mp3|opus # Audio file {seq_num:06d}.json # Metadata + embedded beatmap charts ``` ### JSON Schema ```json { "audio_hash": "sha256_hash_of_audio", "audio_length": 167.92, "beatmaps": [ { "beatmapset_id": 1, "beatmap_id": 75, "approved": 1, // ... "content": "osu file format v14\n..." } ] } ``` Audio files are deduplicated by SHA256 hash. Multiple beatmaps sharing the same audio are grouped into a single sample. In both variants, the hash refers to the original audio. In adherence to WebDataset conventions, all original audio files share the extension `.mp3` **regardless of their actual format**. ## Metadata Fields Most fields are taken from the [osu! API v1 response](https://github.com/ppy/osu-api/wiki#response). The `content` field contains the full `.osu` beatmap chart file (UTF-8 encoded). ## License This dataset contains user-generated content from osu! and is subject to the [osu! Terms of Service](https://osu.ppy.sh/legal/terms). The beatmap charts are created by community mappers and the audio tracks are copyrighted by their respective owners. This dataset is provided for research purposes only.
提供机构:
IamXiangyu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作