five

MRamani/manus-piano-chord-corpus

收藏
Hugging Face2026-01-11 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/MRamani/manus-piano-chord-corpus
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 --- # A Comprehensive Corpus of Biomechanically Constrained Piano Chords See the code used to generate the dataset on my GitHub: https://github.com/Mahesh-Ramani/math-portfolio/tree/main/chord-corpus ## Abstract This repository hosts a corpus of approximately 19.3 million playable piano chords. Unlike datasets derived from transcription or random sampling, this corpus exhaustively enumerates the two-handed search space subject to strict biomechanical constraints. The generation process enforces hand-span limits (1.5 octaves per hand) and note-count constraints to ensure that every entry represents a physically viable sonic event. The dataset was generated to facilitate research in computational musicology, generative modeling, and psychoacoustics. It includes calculated statistical moments for voicing shape and psychoacoustic target metrics (Plomp-Levelt Dissonance and Harmonicity). ## Search Space and Generation The search space is defined by a piano-inspired model adhering to the following constraints: 1. **Range:** The pitch space is bounded by the standard 88-key piano range (MIDI 21–108, A0 to C8). 2. **Hand Span:** To ensure playability, each hand is modeled as a subset of notes fitting within a sliding window of 19 semitones (approximately a 1.5-octave span). 3. **Polyphony:** A chord is defined as the set union of notes generated by a Left Hand (LH) and a Right Hand (RH), where each hand contributes at least one note. ### Sampling Strategy Due to the combinatorial explosion of possible note combinations, a hybrid generation strategy was employed: * **Exhaustive Enumeration ($N \in [2, 5]$):** For chords containing 2 to 5 notes, the dataset contains the complete universe of valid two-handed positions. * **Monte Carlo Sampling ($N \in [6, 10]$):** For chords containing 6 to 10 notes, Monte Carlo sampling (Seed=42) was used to generate 1,000,000 unique, valid instances for each cardinality. ### Dataset Distribution | Number of Notes ($N$) | Count | Generation Mode | | :--- | :--- | :--- | | 2 | 3,828 | Exhaustive | | 3 | 87,636 | Exhaustive | | 4 | 1,265,565 | Exhaustive | | 5 | 12,907,692 | Exhaustive | | 6 | 1,000,000 | Monte Carlo | | 7 | 1,000,000 | Monte Carlo | | 8 | 1,000,000 | Monte Carlo | | 9 | 1,000,000 | Monte Carlo | | 10 | 1,000,000 | Monte Carlo | | **Total** | **~19,264,721** | | ## Data Structure The dataset is provided in CSV format. Each row represents a single chord with the following columns: * `midi_list`: Space-separated string of MIDI note numbers (e.g., "60 64 67"). * `n_notes`: Integer count of notes in the chord. * `centroid_midi`: The arithmetic mean of the MIDI pitches. * `spread_semitones`: The range (max minus min) in semitones. * `skew`: The asymmetry of the note distribution (3rd standardized moment). * `kurtosis`: The tailedness of the distribution (4th standardized moment). * `dissonance_sum`: The sum of all pairwise dissonance interactions. * `dissonance_mean`: The average pairwise dissonance. * `harmonicity`: A 0–1 score indicating how closely the chord aligns with a harmonic series. * `bass_note`: The lowest MIDI note number. * `treble_note`: The highest MIDI note number. * `ic_vector`: Space-separated 12-element Interval Class vector (histogram of intervals mod 12). * `avg_spacing`: The mean size of consecutive intervals in semitones. * `spacing_std`: The standard deviation of consecutive intervals. * `generation_mode`: Indicates if the chord was generated via "ordered" enumeration or "random" sampling. ## Methodology ### Core Dissonance Calculation Sensory dissonance (roughness) is calculated using the Plomp-Levelt model (1965) adapted by Sethares. 1. An 88×88 interaction matrix was precomputed for all piano keys to optimize processing. 2. Pairwise dissonance $d$ is modeled as: $d(x) = e^{-3.5x} - e^{-5.75x}$ where $x$ represents the frequency difference scaled by the critical bandwidth. 3. The total dissonance of a chord is the sum of all pairwise interactions. ### Harmonicity Calculation Harmonicity estimates the fit of a chord's frequencies to a single harmonic series. The algorithm tests candidate fundamental frequencies derived from integer divisions of the note frequencies (divisors 1–12). The score represents the deviation from the nearest integer ratios, normalized such that 1.0 represents a perfect harmonic fit and 0.0 represents maximum inharmonicity. ## Analytical Implications Analysis of this corpus suggests that harmonicity is an intrinsic property determined by pitch-class identity, while dissonance is an extrinsic property highly sensitive to voicing. Regression analysis indicates that statistical **skewness** is a significantly stronger predictor of reduced dissonance than **spread**. This challenges the pedagogical heuristic of "opening up" voicings simply by increasing width; rather, clarity is optimized by negative skewness (larger gaps in the lower register and tighter clustering in the treble). ## License This dataset is released under the **Creative Commons Attribution 4.0 International (CC BY 4.0)** license.
提供机构:
MRamani
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作