litagin/Galgame_Speech_SER_16kHz
收藏Hugging Face2024-11-10 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/litagin/Galgame_Speech_SER_16kHz
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个用于训练语音情感识别(SER)模型的日语语音、文本和情感数据集,源自日本视觉小说(Galgames)。数据集包含3,746,131个音频文件,总计5,353小时,大小为104GB。数据集在litagin/Galgame_Speech_ASR_16kHz的基础上增加了情感标签,这些标签是通过本地LLM仅使用文本注释的,因此可能不准确。数据集采用WebDataset格式,包含16kHz、16位、单声道的OGG文件。数据集的使用受到GNU General Public License v3.0的限制,禁止商业用途,并且使用该数据集训练的模型必须开源。
The Galgame_Speech_SER_16kHz dataset is a Japanese speech, text, and emotion dataset extracted from Japanese visual novels (Galgames), intended for training Speech Emotion Recognition (SER) models. The dataset contains 3,746,131 audio files, totaling 5,353 hours, with a size of 104GB. The emotion labels in the dataset are annotated by a local LLM using text transcriptions, which may not be accurate. The dataset is formatted as 16kHz, 16-bit, mono OGG files, stored in WebDataset format. The primary use of the dataset is for training SER models, but it also highlights limitations such as the accuracy of emotion labels, audio quality, and gender bias present in the dataset.
提供机构:
litagin



