视听跨模态关联数据集
收藏国家基础学科公共科学数据中心2024-03-05 收录
下载链接:
https://www.nbsdc.cn/general/dataDetail?id=64ef8408bb16e0591d024e37&type=1
下载链接
链接失效反馈官方服务:
资源简介:
视听跨模态关联数据集主要面向基于音乐感知特征的发光点阵交互控制技术研究,实现视听融合。在听觉维度以音色为感知特征数据收集对象,视觉维度以颜色、纹理、配色为感知特征收集对象,分别研究视听感知特征之间的关联关系。(1)音色素材是72段由音乐学院的老师和学生在消音室录制的音频,每段对应一种乐器,包括36种中国传统乐器(非少数民族)、12种中国传统乐器(少数民族),还有24种西洋乐器,在明亮暗淡、干瘪柔和、尖锐浑厚、粗糙纯净、嘶哑协和这5个维度进行音色评价标注。(2)纹理素材采用机器视觉领域的公开纹理库,在Brodatz纹理库111张纹理库基础上,通过剔除亮度差异较大、含语义的图片,再通过聚类、多尺度分析获得15张纹理图片。(3)颜色素材选取了HSV颜色空间,选取红绿度、黄蓝度、饱和度和亮度等作为颜色素材的分布,选取八个色调,分别对应于色相环的0°,30°,60°,90°,120°,180°,240°和300°,在此基础上,设计实验颜色色块图,左上角的图为饱和度与亮度均为100%的红色,右上角的图为饱和度为50%、亮度为100%的红色,左下角的图为饱和度为100%、亮度为50%的红色,右下角的图为饱和度和亮度均为50%的红色。按照这样的设计方法,每一个色调都对应有四个颜色,八个色调对应共有32个色块,且每个色块的色调、饱和度和亮度都单独变化。(4)配色素材是一组三个颜色之间的搭配,我们从日本色彩设计研究所的三色配色群中选取出50个样本。以上音色、纹理、颜色、配色素材构成视听跨模态关联数据集,供视听跨模态主观评价实验和视听客观特征提取使用,为构建视听关联模型提供数据支撑。
This audio-visual cross-modal correlation dataset is primarily designed for the research of light-emitting dot matrix interaction control technology based on musical perceptual features, aiming to realize audio-visual fusion. In the auditory dimension, timbre is taken as the collection target of perceptual feature data; in the visual dimension, color, texture and color matching are taken as the collection targets of perceptual features, to respectively study the correlation between audio-visual perceptual features.
1. Timbre materials: 72 audio segments recorded by teachers and students from music conservatories in anechoic chambers, each segment corresponding to one musical instrument. The instruments include 36 traditional Chinese musical instruments (non-ethnic minority), 12 traditional Chinese ethnic minority musical instruments, and 24 Western musical instruments. Timbre evaluation and annotation are conducted along 5 dimensions: bright/dull, dry/soft, sharp/sonorous, rough/pure, and hoarse/consonant.
2. Texture materials: A public texture dataset in the field of machine vision is adopted. Based on the 111-texture Brodatz texture library, 15 texture images are obtained by removing images with large brightness differences and semantic contents, followed by clustering and multi-scale analysis.
3. Color materials: The HSV color space is adopted. Red-green dimension, yellow-blue dimension, saturation and brightness are selected as the distribution features of color materials. Eight hue angles are selected, corresponding to 0°, 30°, 60°, 90°, 120°, 180°, 240° and 300° of the color wheel respectively. On this basis, experimental color block diagrams are designed: the top-left image is red with 100% saturation and 100% brightness; the top-right image is red with 50% saturation and 100% brightness; the bottom-left image is red with 100% saturation and 50% brightness; the bottom-right image is red with 50% saturation and 50% brightness. Following this design method, each hue corresponds to four colors, so the total of 8 hues yields 32 color blocks in total, with the hue, saturation and brightness of each color block varying independently.
4. Color matching materials: These are combinations of three colors. 50 samples are selected from the three-color matching groups of the Japan Color Design Institute.
The above timbre, texture, color and color matching materials constitute the audio-visual cross-modal correlation dataset, which is used for audio-visual cross-modal subjective evaluation experiments and audio-visual objective feature extraction, providing data support for the construction of audio-visual correlation models.
提供机构:
中国传媒大学
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个用于视听跨模态关联研究的资源,主要支持音乐感知特征与视觉特征的融合分析。数据集包含音色、纹理、颜色和配色四类素材,其中音色素材涵盖多种乐器并带有音色评价标注,视觉素材经过精心设计以提取感知特征。这些数据旨在通过主观评价和客观特征提取,为构建视听关联模型提供基础支撑。
以上内容由遇见数据集搜集并总结生成



