five

Image and sound data from film Fantasia produced by Walt Disney

收藏
Mendeley Data2024-06-29 更新2024-06-28 收录
下载链接:
https://figshare.com/articles/FantasiaDisney_ImageSound/5999207
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains the data used in the article Convolutional neural networks and transfer learning applied to automatic composition of descriptive music published in the 15th International Conference on Distributed Computing and Artificial Intelligence (DCAI). Data structure is explained in detail in the article. This proposal is the continuation of an earlier work whose data are available in a GitHub repository. Abstract Visual and musical arts has been strongly interconnected throughout history. The aim of this work is to compose music on the basis of the visual characteristics of a video. For this purpose, descriptive music is used as a link between image and sound and a video fragment of film Fantasia is deeply analyzed. Specially, convolutional neural networks in combination with transfer learning are applied in the process of extracting image descriptors. In order to establish a relationship between the visual and musical information, Naive Bayes, Support Vector Machine and Random Forest classifiers are applied. The obtained model is subsequently employed to compose descriptive music from a new video. The results of this proposal are compared with those of an antecedent work in order to evaluate the performance of the classifiers and the quality of the descriptive musical composition. DATAtrain_data.arff: Image descriptors and the most important sound of each frame from the fragment "The Nutcracker Suite" in film Fantasia obtained by means of CNNs. Data stored into ARFF format.test_data.arff: Image descriptors of each frame from the fragment "The Firebird" in film Fantasia 2000 obtained by means of CNNs. Data stored into ARFF format.midi.csv: Frame number of the fragment "The Firebird" in film Fantasia 2000 and the sound predicted by the system encoded in MIDI. Data stored into CSV format.firebird_prediction.mp3: Audio file with the synthesizing of the prediction data for the fragment "The Firebird" of film Fantasia 2000.LICENSEData is available under MIT License. To make use of the data the article must be cited.

本仓库包含发表于第15届国际分布式计算与人工智能会议(DCAI)的论文《应用于描述性音乐自动作曲的卷积神经网络与迁移学习》中所使用的数据集。论文中对该数据集的结构进行了详细说明。本研究是此前一项工作的延续,该前期工作的数据集可在GitHub仓库中获取。 【摘要】视觉艺术与音乐艺术在历史进程中始终保持着紧密的互联关系。本研究旨在基于视频的视觉特征生成描述性音乐(descriptive music)。为此,我们以描述性音乐作为图像与声音之间的联结媒介,并对电影《幻想曲》(Fantasia)的一段视频片段进行了深入分析。具体而言,我们将卷积神经网络(Convolutional Neural Networks, CNN)与迁移学习相结合,用于提取图像特征描述符。为建立视觉信息与音乐信息之间的关联,我们采用了朴素贝叶斯(Naive Bayes)、支持向量机(Support Vector Machine)与随机森林(Random Forest)三类分类器。随后将训练得到的模型用于基于新视频生成描述性音乐。为评估分类器性能与描述性音乐生成质量,本研究将本次实验结果与前期工作的结果进行了对比。 ### 数据集文件说明 1. `train_data.arff`:通过卷积神经网络提取的电影《幻想曲》(Fantasia)中《胡桃夹子组曲》(The Nutcracker Suite)片段各帧的图像特征描述符与对应帧的核心音频信息,数据以ARFF格式存储。 2. `test_data.arff`:通过卷积神经网络提取的电影《幻想曲2000》(Fantasia 2000)中《火鸟》(The Firebird)片段各帧的图像特征描述符,数据以ARFF格式存储。 3. `midi.csv`:电影《幻想曲2000》中《火鸟》片段的帧编号与系统预测的MIDI编码音频信息,数据以CSV格式存储。 4. `firebird_prediction.mp3`:基于《幻想曲2000》中《火鸟》片段的预测数据合成的音频文件。 【许可证】本数据集采用MIT许可证开源。使用本数据集时需引用本论文。
创建时间:
2023-06-28
二维码
社区交流群
二维码
科研交流群
商业服务