Motiv: A Dataset of Latent Space Representations of Musical Phrase Motions
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/RWCG4B
下载链接
链接失效反馈官方服务:
资源简介:
Motiv: A Dataset of Latent Space Representations of Musical Phrase Motions This study introduces a novel approach for analyzing musical motions through the creation of the Motiv dataset. The Motiv dataset was constructed through a four-step process that involved selecting professional saxophonists, defining source materials, establishing parameters for musical motions, and modeling the musical phrases in a latent space. The study involved four highly skilled saxophonists performing mixed music works, particularly on the tenor saxophone. They recorded three musical phrases from ``Lamento'' by Jesús Villa-Rojo, each representing different emotional and technical characteristics. The saxophonists were guided to record variations of the original phrases, classified into three motion types—parallel, oblique, and contrary—based on specific guidelines that allowed for flexibility in interpretation. These transformations captured nuanced dynamics, articulation, pitch, and rhythm changes while maintaining temporal coherence. The dataset includes the recorded audio samples and their latent space representations, which were generated using a RAVE model. This model efficiently processes the audio and creates a structured representation of its spectral and temporal characteristics. Each sample in the dataset is annotated with details about the motion transformation and includes musical scores for reference. The data is organized in a comprehensive structure, stored in HDF5 format for easy management, and includes both the waveform and latent vector data. The dataset is intended for further analysis and is made publicly available for research purposes, enabling deeper exploration of musical motion and its interaction with latent space models. The Motiv dataset lays the groundwork for exploring the role of latent spaces in understanding and synthesizing thematic elaboration, with a specific focus on the geometric relationships between three motion types: parallel, oblique, and contrary. By utilizing a RAVE model to map the recorded audio into latent space, we present a structured representation of musical phrases that enables the analysis of these motion types and their variations.
创建时间:
2025-05-12



