five

Griko-Italian speech translation corpus

收藏
arXiv2018-07-28 更新2024-06-21 收录
下载链接:
http://griko.project.uoi.gr
下载链接
链接失效反馈
官方服务:
资源简介:
Griko-Italian speech translation corpus是一个针对濒危语言Griko的小型平行语料库,由格勒诺布尔信息实验室创建。该数据集包含330条语音记录,总时长约20分钟,每条记录均配有意大利语翻译和词级语音到转录及翻译的对齐标注。数据集还包括形态句法标签和词级注释,以及通过自动单元发现方法生成的伪电话。此数据集旨在支持计算语言文档研究,特别是在零资源任务中,如语音到翻译对齐和无监督词发现。

The Griko-Italian speech translation corpus is a small parallel corpus targeting the endangered language Griko, created by the Grenoble Information Laboratory. This dataset comprises 330 speech recordings with a total duration of approximately 20 minutes. Each recording is accompanied by an Italian translation, as well as word-level alignment annotations between the speech signal, its transcript, and the translation. Additionally, the dataset includes morphosyntactic tags, word-level annotations, and pseudo-phones generated through automatic unit discovery methods. This corpus aims to support computational linguistic documentation research, particularly for zero-resource tasks such as speech-to-translation alignment and unsupervised word discovery.
提供机构:
格勒诺布尔信息实验室,格勒诺布尔阿尔卑斯大学,法国
创建时间:
2018-07-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作