English corpus of focused speech and text

Name: English corpus of focused speech and text
Creator: 奈良先端科学技术大学院大学
Published: 2022-09-29 09:40:11
License: 暂无描述

arXiv2022-09-29 更新2024-06-21 收录

下载链接：

https://dsc-nlp.naist.jp/data/speech/paralinguistic_paraphrase/

下载链接

链接失效反馈

官方服务：

资源简介：

本研究创建了一个包含焦点不同的英语语音及其对应文本的英语语料库，旨在通过词汇和语法手段将副语言信息映射到源语言的语言域内，以保留副语言信息。该语料库包含3423条语音数据，通过Amazon Mechanical Turk平台收集，涉及不同的焦点放置和相应的文本，反映语音的隐含意义。数据集的创建过程包括文本设计和语音收集，确保每条语音只有一个句子重音。该语料库将用于推进副语言翻译的研究，解决现有语音翻译系统无法考虑副语言信息的问题。

This study created an English corpus consisting of English speech samples with different focus placements and their corresponding transcripts, aiming to map paralinguistic information into the linguistic domain of the source language via lexical and grammatical means to preserve such paralinguistic information. This corpus contains 3,423 speech samples collected via the Amazon Mechanical Turk platform, which involve diverse focus placements and their corresponding texts, reflecting the implicit meanings conveyed in speech. The dataset construction process includes text design and speech collection, ensuring that each speech sample has exactly one sentence stress. This corpus will be used to advance research on paralinguistic translation, addressing the limitation that existing speech translation systems fail to take paralinguistic information into account.

提供机构：

奈良先端科学技术大学院大学

创建时间：

2022-03-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集