five

UMD-350MB: Refined MIDI Dataset for Symbolic Music Generation

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13125873
下载链接
链接失效反馈
官方服务:
资源简介:
UMD-350MB The Universal MIDI Dataset 350MB (UMD-350MB) is a proprietary collection of 85,618 MIDI files curated for research and development within our organization. This collection is a subset sampled from a larger dataset developed for pretraining symbolic music models. The field of symbolic music generation is constrained by limited data compared to language models. Publicly available datasets, such as the Lakh MIDI Dataset, offer large collections of MIDI files sourced from the web. While the sheer volume of musical data might appear beneficial, the actual amount of valuable data is less than anticipated, as many songs contain less desirable melodies with erratic and repetitive events. The UMD-350MB employs an attention-based approach to achieve more desirable output generations by focusing on human-reviewed training examples of single-track melodies, chord progressions, leads and arpeggios with an average duration of 8 bars. This was achieved by refining the dataset over 24 months, ensuring consistent quality and tempo alignment. Moreover, the dataset is normalized by setting the timing information to 120 BPM with a tick resolution (PPQ) of 96 and transposing the musical scales to C major and A minor (natural scales). Melody Styles A major portion of the dataset is composed of newly produced private data to represent modern musical styles. Pop: 1970s to 2020s Pop music EDM: Trance, House, Synthwave, Dance, Arcade Jazz: Bebop, Ballad, Latin-Jazz, Bossa-Jazz, Ragtime Soul: 80s Classic, Neo-Soul, Latin-Soul Urban: Pop, Hip-Hop, Trap, R&B, Afrobeat World: Latin, Bossa Nova, European Other: Film, Cinematic, Game music and piano references Actual MIDI files are unlabeled for unsupervised training. Dataset Access Please note that this is a closed-source dataset with very limited access. Considerations for access include proposals for data augmentation, chord extraction and other enhancement methods, whether through scripts, algorithmic techniques, manual editing in a DAW or additional processing methods. For inquiries about this dataset, please email us.
创建时间:
2024-07-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作