korean-voice-emotion-dataset

Name: korean-voice-emotion-dataset
Creator: maas
Published: 2025-12-05 16:44:13
License: 暂无描述

魔搭社区2025-12-05 更新2025-08-16 收录

下载链接：

https://modelscope.cn/datasets/Kratos-AI/korean-voice-emotion-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

# Korean Voice Emotion Dataset *This dataset contains high-quality (“A-grade”) data. It has been carefully curated, cleaned, and verified to ensure accuracy, completeness, and consistency, making it suitable for high-stakes or production-grade model training. ## Dataset Summary This dataset comprises high-quality Korean speech recordings designed for training and evaluating Speech Emotion Recognition (SER) models. The dataset contains voice samples expressing four distinct emotions: **Angry**, **Happy**, **Sad**, and **Surprised**. Each recording is categorized by speaker demographics (age: Young/Old, gender: Male/Female), providing a comprehensive resource for emotion classification research in Korean speech. ## Contact For queries or collaborations related to this dataset, contact: - anoushka@kgen.io - abhishek.vadapalli@kgen.io ## Supported Tasks - **Task Categories**: Speech Emotion Recognition (SER) - **Supported Tasks**: - Emotion classification from speech - Audio signal processing for affective computing - Speaker demographic analysis - Cross-cultural emotion recognition research - Voice synthesis with emotional expression (secondary use) ## Languages - **Primary Language**: Korean (ko) ## Dataset Structure ### Data Instances Each instance consists of an audio file and corresponding metadata: - **Audio**: WAV format audio files - **Naming Convention**: `{Age}_{Gender}{ID}_{Emotion}.wav` (e.g., `Young_Female1_Happy.wav`) - **Total Samples**: 66 recordings ### Data Fields The dataset includes: - `audio`: Audio file in WAV format - `age`: Speaker age group (`Young` or `Old`) - `gender`: Speaker gender (`Male` or `Female`) - `emotion`: Expressed emotion (`Angry`, `Happy`, `Sad`, `Surprised`) ### Data Distribution - **Emotions**: 4 classes (Angry, Happy, Sad, Surprised) - **Demographics**: - Young speakers: 44 samples (28 Female, 16 Male) - Old speakers: 22 samples (12 Female, 10 Male) - **Gender Split**: 40 Female, 26 Male recordings ## Dataset Creation ### Curation Rationale This dataset was created to advance Korean speech emotion recognition research by providing labeled emotional speech samples across different demographic groups. The inclusion of age and gender metadata enables research into how emotional expression varies across demographics in Korean speech patterns. ### Source Data - **Contributors**: Korean native speakers across different age groups and genders - **Recording Guidelines**: Speakers were asked to express specific emotions naturally in Korean, ensuring authentic emotional expression while maintaining audio quality standards. ### Annotations - **Annotation Process**: Each audio file is manually labeled with emotion, age group, and gender information - **Annotators**: Native Korean speakers familiar with emotional expression patterns - **Quality Control**: Multiple validation steps to ensure emotion labels match audio content ## Considerations for Using the Data ### Social Impact of Dataset This dataset aims to advance Korean speech emotion recognition technology, potentially improving human-computer interaction in Korean-speaking contexts. Applications include mental health monitoring, customer service automation, and educational technology with culturally-aware emotional understanding. ### Discussion of Biases - **Demographic Imbalance**: More young speakers (67%) than old speakers (33%), and more female speakers (61%) than male speakers (39%) - **Emotion Representation**: Equal distribution across four emotions, but may not reflect natural emotion frequency in real-world scenarios - **Cultural Context**: Emotions are expressed within Korean cultural norms, which may differ from other cultures ### Other Known Limitations - **Size**: Relatively small dataset (66 samples) may limit model generalization - **Emotion Categories**: Limited to four basic emotions; complex or mixed emotions not represented - **Audio Quality**: Variations in recording conditions may affect model performance - **Regional Dialects**: May not represent all Korean regional speech patterns ## Intended Uses ### ✅ Direct Use - Training and benchmarking Speech Emotion Recognition models for Korean - Research in cross-cultural emotion recognition - Development of Korean-language affective computing applications - Academic research in computational linguistics and psychology ### ❌ Out-of-Scope Use - Real-time production systems without additional validation - Clinical or diagnostic applications for mental health - Commercial use without proper attribution - Surveillance or privacy-invasive applications ## Considerations and Limitations - ❗ The dataset is small (66 samples) and may not be fully representative of all Korean speakers - 💡 Emotional expression is culturally and individually variable — results may differ across listeners - 🔄 Future versions could benefit from regional dialect inclusion and larger sample sizes - ⚖️ Demographic imbalances should be considered when training models ## License CC BY 4.0

# 韩语语音情感数据集 *本数据集包含高质量（A级）数据，经过精心筛选、清理与验证，确保数据的准确性、完整性与一致性，适用于高要求场景或生产级模型训练。 ## 数据集概述本数据集包含高质量韩语语音录音，用于训练与评估语音情感识别（Speech Emotion Recognition, SER）模型。数据集涵盖表达四种典型情感的语音样本：愤怒（Angry）、愉悦（Happy）、悲伤（Sad）与惊讶（Surprised）。每条录音均标注了说话者的人口统计学信息（年龄：青年/老年，性别：男/女），为韩语语音情感分类研究提供了全面的资源。 ## 联系方式若对本数据集有咨询或合作需求，请联系： - anoushka@kgen.io - abhishek.vadapalli@kgen.io ## 支持任务 - **任务类别**：语音情感识别（Speech Emotion Recognition, SER） - **支持任务**： - 语音情感分类 - 情感计算领域的音频信号处理 - 说话者人口统计学分析 - 跨文化情感识别研究 - 带情感表达的语音合成（二级使用） ## 语言 - **主要语言**：韩语（ko） ## 数据集结构 ### 数据实例每条数据实例包含一个音频文件与对应的元数据： - **音频**：WAV格式音频文件 - **命名规则**：`{Age}_{Gender}{ID}_{Emotion}.wav`（示例：`Young_Female1_Happy.wav`） - **总样本数**：66条录音 ### 数据字段本数据集包含以下字段： - `audio`：WAV格式音频文件 - `age`：说话者年龄组，取值为`Young`（青年）或`Old`（老年） - `gender`：说话者性别，取值为`Male`（男性）或`Female`（女性） - `emotion`：表达的情感，取值为`Angry`（愤怒）、`Happy`（愉悦）、`Sad`（悲伤）或`Surprised`（惊讶） ### 数据分布 - **情感类别**：共4类（愤怒、愉悦、悲伤、惊讶） - **人口统计学分布**： - 青年说话者：44条样本（女性28条，男性16条） - 老年说话者：22条样本（女性12条，男性10条） - **性别拆分**：女性录音40条，男性录音26条 ## 数据集构建 ### 构建初衷本数据集旨在通过提供覆盖不同人口统计学群体的标注情感语音样本，推动韩语语音情感识别研究的发展。纳入年龄与性别元数据，可支持关于韩语语音中情感表达如何随人口统计学特征变化的相关研究。 ### 源数据 - **贡献者**：来自不同年龄组与性别的韩语母语使用者 - **录制规范**：要求说话者以自然的方式用韩语表达指定情感，在保证音频质量标准的同时，确保情感表达的真实性。 ### 标注流程 - **标注过程**：每条音频文件均由人工标注情感、年龄组与性别信息 - **标注人员**：熟悉情感表达模式的韩语母语使用者 - **质量控制**：采用多轮验证步骤，确保情感标签与音频内容匹配 ## 数据使用注意事项 ### 数据集的社会影响本数据集旨在推动韩语语音情感识别技术的发展，有望改善韩语语境下的人机交互体验。其应用场景包括心理健康监测、客服自动化以及具备文化感知情感理解能力的教育技术等。 ### 偏差说明 - **人口统计学不平衡**：青年说话者样本占比更高（67%），老年说话者仅占33%；女性录音占比61%，男性录音占比39% - **情感分布**：四类情感的样本分布均衡，但可能无法反映真实场景中自然的情感出现频率 - **文化语境**：情感表达遵循韩语文化规范，与其他文化可能存在差异 ### 其他已知局限性 - **样本规模**：数据集规模相对较小（仅66条样本），可能限制模型的泛化能力 - **情感类别限制**：仅涵盖四种基础情感，未涵盖复杂或混合情感 - **录音质量差异**：不同录制环境下的音频质量存在差异，可能影响模型性能 - **地域方言**：未覆盖所有韩语地域方言的语音模式 ## 预期用途 ### ✅ 直接使用场景 - 训练与基准测试韩语语音情感识别模型 - 跨文化情感识别相关研究 - 韩语情感计算应用的开发 - 计算语言学与心理学领域的学术研究 ### ❌ 超出范围的使用场景 - 未经额外验证的实时生产系统 - 心理健康相关的临床或诊断应用 - 未经适当署名的商业使用 - 监视或侵犯隐私的应用 ## 其他考虑与局限性 - ❗ 本数据集规模较小（66条样本），可能无法完全代表所有韩语使用者群体 - 💡 情感表达具有文化与个体差异性——不同听众的感知结果可能存在差异 - 🔄 未来版本可通过纳入地域方言与扩大样本量进一步优化 - ⚖️ 训练模型时需考虑人口统计学分布的不平衡性 ## 授权协议 CC BY 4.0

提供机构：

maas

创建时间：

2025-08-01

搜集汇总

数据集介绍