five

thejorseman/CloneHeroDatasetCharts

收藏
Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/thejorseman/CloneHeroDatasetCharts
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: mit task_categories: - audio-to-audio - token-classification task_ids: - music-generation tags: - music - gaming - clone-hero - rhythm-game - guitar-hero - audio-generation - chart-generation pretty_name: Clone Hero Charts Dataset size_categories: - 10K<n<100K --- # Clone Hero Charts Dataset ## Dataset Description Tokenized [Clone Hero](https://clonehero.net/) charts with beat-level audio conditioning. Each row is one instrument track (guitar / bass / drums) from one song. | Feature | Value | |---------|-------| | Total rows (`train`) | 43,665 | | Parquet shards | 1753 | | Audio: MERT embeddings | Yes [num_beats, 768] | | Audio: log-mel frames | Yes [num_beats, 32, 128] | ## Dataset Structure ### Data Fields | Column | Type | Description | |--------|------|-------------| | `song_id` | string | MD5 hash of artist + title | | `instrument` | string | `"guitar"` / `"bass"` / `"drums"` | | `source_format` | string | `"chart"` or `"midi"` | | `tokens` | list[int32] | Token sequence (model target) | | `num_tokens` | int32 | Token count | | `num_beats` | int32 | Beat count | | `mert_embeddings` | list[list[float32]] | MERT embeddings per beat [B, 768] | | `logmel_frames` | list[list[list[float32]]] | Log-mel spectrogram [B, 32, 128] | | `beat_times_s` | list[float32] | Beat start times (seconds) | | `beat_durations_s` | list[float32] | Beat durations (seconds) | | `bpm_at_beat` | list[float32] | BPM at each beat | | `time_sig_num_at_beat` | list[int32] | Time signature numerator | | `time_sig_den_at_beat` | list[int32] | Time signature denominator | | `song_name` | string | Song title | | `artist` | string | Artist name | | `genre` | string | Genre tag | | `charter` | string | Charter name | | `year` | int32 | Release year | | `song_length_ms` | int32 | Song length in milliseconds | | `difficulty` | int32 | Difficulty 0–6 (−1 = unset) | | `resolution` | int32 | Tick resolution (normalised to 192) | | `has_star_power` | bool | Track contains star-power sections | | `has_solo` | bool | Track contains solo sections | | `has_dedicated_stem` | bool | Instrument-specific audio stem available | | `num_notes` | int32 | Total note count | | `notes_per_beat_mean` | float32 | Average notes per beat | | `chord_ratio` | float32 | Fraction of notes that are chords | | `sustain_mean_ticks` | float32 | Mean sustain length in ticks | | `bpm_mean` | float32 | Mean BPM | | `bpm_std` | float32 | BPM standard deviation | ### Data Splits | Split | Rows | |-------|------| | `train` | 43,665 | ## Usage ```python from datasets import load_dataset # Stream individual shards (low RAM): ds = load_dataset("thejorseman/CloneHeroDatasetCharts", split="train", streaming=True) for row in ds: print(row["song_name"], row["instrument"], len(row["tokens"])) ``` ## Source Data Charts scraped from public Clone Hero repositories in `.chart` and `.mid` format. Audio conditioning extracted with: - **MERT** (`m-a-p/MERT-v1-330M`) — 768-dimensional embeddings, mean-pooled per beat - **Log-mel** — 128-bin spectrograms resampled to 32 time frames per beat ## License MIT — see [LICENSE](LICENSE). ## Citation ```bibtex @misc{clonecharter2025, title = {Clone Hero Charts Dataset}, author = {The Jorseman}, year = {2025}, url = {https://huggingface.co/datasets/thejorseman/CloneHeroDatasetCharts} } ```
提供机构:
thejorseman
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作