Griko-Italian speech translation corpus

Name: Griko-Italian speech translation corpus
Creator: 格勒诺布尔信息实验室，格勒诺布尔阿尔卑斯大学，法国
Published: 2018-07-28 01:29:20
License: 暂无描述

arXiv2018-07-28 更新2024-06-21 收录

下载链接：

http://griko.project.uoi.gr

下载链接

链接失效反馈

官方服务：

资源简介：

Griko-Italian speech translation corpus是一个针对濒危语言Griko的小型平行语料库，由格勒诺布尔信息实验室创建。该数据集包含330条语音记录，总时长约20分钟，每条记录均配有意大利语翻译和词级语音到转录及翻译的对齐标注。数据集还包括形态句法标签和词级注释，以及通过自动单元发现方法生成的伪电话。此数据集旨在支持计算语言文档研究，特别是在零资源任务中，如语音到翻译对齐和无监督词发现。

The Griko-Italian speech translation corpus is a small parallel corpus targeting the endangered language Griko, created by the Grenoble Information Laboratory. This dataset comprises 330 speech recordings with a total duration of approximately 20 minutes. Each recording is accompanied by an Italian translation, as well as word-level alignment annotations between the speech signal, its transcript, and the translation. Additionally, the dataset includes morphosyntactic tags, word-level annotations, and pseudo-phones generated through automatic unit discovery methods. This corpus aims to support computational linguistic documentation research, particularly for zero-resource tasks such as speech-to-translation alignment and unsupervised word discovery.

提供机构：

格勒诺布尔信息实验室，格勒诺布尔阿尔卑斯大学，法国

创建时间：

2018-07-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集