MELD-ST
收藏arXiv2024-05-22 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2405.13233v1
下载链接
链接失效反馈官方服务:
资源简介:
MELD-ST数据集是由京都大学创建的情感感知语音翻译数据集,专注于英语到日语和英语到德语的语言对。该数据集包含约10,000条带有情感标签的语音片段,源自电视剧《老友记》,情感标签来自MELD数据集。数据集的创建过程涉及从蓝光光盘中提取音频和字幕,并使用光学字符识别工具进行文本清洗和时间戳提取。MELD-ST数据集适用于情感感知语音翻译研究,旨在通过情感标签提升翻译系统的性能,特别是在处理情感丰富的语句时。
The MELD-ST dataset is an emotion-aware speech translation dataset developed by Kyoto University, focusing on the English-to-Japanese and English-to-German language pairs. It contains approximately 10,000 speech segments with emotion labels, which are sourced from the TV series *Friends*, with the emotion labels adopted from the MELD dataset. The dataset creation workflow involves extracting audio and subtitle content from Blu-ray discs, followed by text cleaning and timestamp extraction using optical character recognition (OCR) tools. The MELD-ST dataset is designed for emotion-aware speech translation research, with the goal of enhancing the performance of translation systems by utilizing emotion labels, particularly when processing emotionally expressive utterances.
提供机构:
京都大学, 日本
创建时间:
2024-05-22



