AudioSetCaps|音频-语言多模态数据集|多模态数据数据集
收藏AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models
数据集概述
- 数据来源: AudioSet, YouTube-8M, VGGSound
- 音频文件数量: 6,117,099个10秒音频文件
- Q&A数据对数量: 18,414,789对
数据集内容
- 音频描述: 每个音频文件附有一个描述性标题。
- Q&A对: 每个音频文件附有三个Q&A对,作为生成最终标题的元数据。
示例
ID | 音频 | 描述 | Q&A 1 | Q&A 2 | Q&A 3 |
---|---|---|---|---|---|
_7Xe9vD3Hpg_4_10 | <audio controls><source src="Example /_7Xe9vD3Hpg_4_10.mp3" type="audio/mpeg"> Your browser does not support the audio element.</audio> | A solemn instrumental piece unfolds, featuring the melancholic strains of a cello and the resonant tolling of a bell. The initial tempo is slow and deliberate, gradually building intensity with each successive bell ring. | Question: Describe this audio according to the sounds in it. Answer: The music starts with a slow string melody and continues with a bass note. The sound of a bell rings and the music becomes more intense. | Question: Based on the QAs, give some information about the speech, such as the emotion of the speaker, the gender of the speaker, and the spoken language, only if speech is present in this audio. Answer: Im sorry, but there is no speech in the audio. | Question: Based on the QAs, give some information about the music, such as music genre and music instruments, only if music is present in this audio. Answer: The music genre is instrumental. The music instruments are the cello and the bell. |
-TL8Mp3xcUM_0_10 | <audio controls><source src="Example/-TL8Mp3xcUM_0_10.mp3" type="audio/mpeg"> Your browser does not support the audio element.</audio> | A woman expresses strong emotions with a noticeably high-pitched vocal tone. | Question: Describe this audio according to the sounds in it. Answer: A woman speaks with a high-pitched voice. | Question: Based on the QAs, give some information about the speech, such as the emotion of the speaker, the gender of the speaker, and the spoken language, only if speech is present in this audio. Answer: The speech is emotional, as the woman speaks in a high-pitched voice. | Question: Based on the QAs, give some information about the music, such as music genre and music instruments, only if music is present in this audio. Answer: There is no music in this audio. |
数据统计
数据集 | 音频描述数量 | Q&A描述数量 | 总计 |
---|---|---|---|
AudioSetCaps | 1910920 | 5736072 | 7646992 |
YouTube-8M | 4023990 | 12086037 | 16110027 |
VGGSound | 182189 | 592680 | 774869 |
总计 | 6117099 | 18414789 | 24531888 |
下载
许可证
- 仅允许学术用途
引用
bibtex @inproceedings{ bai2024audiosetcaps, title={AudioSetCaps: Enriched Audio Captioning Dataset Generation Using Large Audio Language Models}, author={JISHENG BAI and Haohe Liu and Mou Wang and Dongyuan Shi and Wenwu Wang and Mark D Plumbley and Woon-Seng Gan and Jianfeng Chen}, booktitle={Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation}, year={2024}, url={https://openreview.net/forum?id=uez4PMZwzP} }

中国1km分辨率逐月降水量数据集(1901-2024)
该数据集为中国逐月降水量数据,空间分辨率为0.0083333°(约1km),时间为1901.1-2024.12。数据格式为NETCDF,即.nc格式。该数据集是根据CRU发布的全球0.5°气候数据集以及WorldClim发布的全球高分辨率气候数据集,通过Delta空间降尺度方案在中国降尺度生成的。并且,使用496个独立气象观测点数据进行验证,验证结果可信。本数据集包含的地理空间范围是全国主要陆地(包含港澳台地区),不含南海岛礁等区域。为了便于存储,数据均为int16型存于nc文件中,降水单位为0.1mm。 nc数据可使用ArcMAP软件打开制图; 并可用Matlab软件进行提取处理,Matlab发布了读入与存储nc文件的函数,读取函数为ncread,切换到nc文件存储文件夹,语句表达为:ncread (‘XXX.nc’,‘var’, [i j t],[leni lenj lent]),其中XXX.nc为文件名,为字符串需要’’;var是从XXX.nc中读取的变量名,为字符串需要’’;i、j、t分别为读取数据的起始行、列、时间,leni、lenj、lent i分别为在行、列、时间维度上读取的长度。这样,研究区内任何地区、任何时间段均可用此函数读取。Matlab的help里面有很多关于nc数据的命令,可查看。数据坐标系统建议使用WGS84。
国家青藏高原科学数据中心 收录
QIN LUNG CT 肺癌 CT 影像数据集
TCIA-QIN-LUNG Dataset 是肺癌 CT 影像数据集,其包含 47 个病人的 3954 张图像,分为 47 个系列共计 47 项研究。
超神经 收录
中国区域交通网络数据集
该数据集包含中国各区域的交通网络信息,包括道路、铁路、航空和水路等多种交通方式的网络结构和连接关系。数据集详细记录了各交通节点的位置、交通线路的类型、长度、容量以及相关的交通流量信息。
data.stats.gov.cn 收录
中国省级灾害统计空间分布数据集(1999-2020年)
该数据集为中国省级灾害统计空间分布数据集,时间为1999-2020年。该数据集包含中国各省自然灾害、地质灾害、地震灾害、森林火灾、森林病虫鼠害、草原灾害六类灾害的详细数据。数据量为206MB,数据格式为excel。
国家地球系统科学数据中心 收录
TCIA
TCIA(The Cancer Imaging Archive)是一个公开的癌症影像数据集,包含多种癌症类型的医学影像数据,如CT、MRI、PET等。这些数据通常与临床和病理信息相结合,用于癌症研究和临床试验。
www.cancerimagingarchive.net 收录