thejorseman/CloneHeroDatasetCharts
收藏Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/thejorseman/CloneHeroDatasetCharts
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: mit
task_categories:
- audio-to-audio
- token-classification
task_ids:
- music-generation
tags:
- music
- gaming
- clone-hero
- rhythm-game
- guitar-hero
- audio-generation
- chart-generation
pretty_name: Clone Hero Charts Dataset
size_categories:
- 10K<n<100K
---
# Clone Hero Charts Dataset
## Dataset Description
Tokenized [Clone Hero](https://clonehero.net/) charts with beat-level audio conditioning.
Each row is one instrument track (guitar / bass / drums) from one song.
| Feature | Value |
|---------|-------|
| Total rows (`train`) | 43,665 |
| Parquet shards | 1753 |
| Audio: MERT embeddings | Yes [num_beats, 768] |
| Audio: log-mel frames | Yes [num_beats, 32, 128] |
## Dataset Structure
### Data Fields
| Column | Type | Description |
|--------|------|-------------|
| `song_id` | string | MD5 hash of artist + title |
| `instrument` | string | `"guitar"` / `"bass"` / `"drums"` |
| `source_format` | string | `"chart"` or `"midi"` |
| `tokens` | list[int32] | Token sequence (model target) |
| `num_tokens` | int32 | Token count |
| `num_beats` | int32 | Beat count |
| `mert_embeddings` | list[list[float32]] | MERT embeddings per beat [B, 768] |
| `logmel_frames` | list[list[list[float32]]] | Log-mel spectrogram [B, 32, 128] |
| `beat_times_s` | list[float32] | Beat start times (seconds) |
| `beat_durations_s` | list[float32] | Beat durations (seconds) |
| `bpm_at_beat` | list[float32] | BPM at each beat |
| `time_sig_num_at_beat` | list[int32] | Time signature numerator |
| `time_sig_den_at_beat` | list[int32] | Time signature denominator |
| `song_name` | string | Song title |
| `artist` | string | Artist name |
| `genre` | string | Genre tag |
| `charter` | string | Charter name |
| `year` | int32 | Release year |
| `song_length_ms` | int32 | Song length in milliseconds |
| `difficulty` | int32 | Difficulty 0–6 (−1 = unset) |
| `resolution` | int32 | Tick resolution (normalised to 192) |
| `has_star_power` | bool | Track contains star-power sections |
| `has_solo` | bool | Track contains solo sections |
| `has_dedicated_stem` | bool | Instrument-specific audio stem available |
| `num_notes` | int32 | Total note count |
| `notes_per_beat_mean` | float32 | Average notes per beat |
| `chord_ratio` | float32 | Fraction of notes that are chords |
| `sustain_mean_ticks` | float32 | Mean sustain length in ticks |
| `bpm_mean` | float32 | Mean BPM |
| `bpm_std` | float32 | BPM standard deviation |
### Data Splits
| Split | Rows |
|-------|------|
| `train` | 43,665 |
## Usage
```python
from datasets import load_dataset
# Stream individual shards (low RAM):
ds = load_dataset("thejorseman/CloneHeroDatasetCharts", split="train", streaming=True)
for row in ds:
print(row["song_name"], row["instrument"], len(row["tokens"]))
```
## Source Data
Charts scraped from public Clone Hero repositories in `.chart` and `.mid` format.
Audio conditioning extracted with:
- **MERT** (`m-a-p/MERT-v1-330M`) — 768-dimensional embeddings, mean-pooled per beat
- **Log-mel** — 128-bin spectrograms resampled to 32 time frames per beat
## License
MIT — see [LICENSE](LICENSE).
## Citation
```bibtex
@misc{clonecharter2025,
title = {Clone Hero Charts Dataset},
author = {The Jorseman},
year = {2025},
url = {https://huggingface.co/datasets/thejorseman/CloneHeroDatasetCharts}
}
```
提供机构:
thejorseman



